pur ъ6 
ا‎ 


| "Y "uu 
А MIT Р SK |," Ly «65° A^ 
. E à | J "a b 
WE. o coe 
5 E е Ü ju | 
T " b h M "n » 
> | : М d А 
4 • «Ф mi , ¢ à = ] 
"si M : Deer " К * “.„ 
Чч Ф о E Pc n aa J ] 
AUR T mue rd m e 
ia $i ete: Иб: 
di. ч ti i3 . ‘ | d 


ELEMENTARY 


STATISTICAL METHODS 


ELEMENTARY 


[9] 


UNIVERSITY OF LONDON PRESS LIMITED 
Warwick Square London EC4 


Ze 


STATISTICAL METHODS 


IN PSYCHOLOGY AND EDUCATION 


Paul Blommers and E. F. Lindquist 


State University of Iowa 


NC Calcutta S / 
“8. ê. od 


2 


corYRIGHT © 1960 BY PAUL BLOMMERS AND E. F. LINDQUIST 
ALL RIGHTS RE 


SERVED INCLUDING THE RIGHT ‘TO REPRODUCE 


S.C.E R T., West Bengal 
Acc. N Tc 8. 


— 


Published by University of London Press Ltd 1965 
First Published by Houghton Mifflin Company Boston Mass 1960 
Printed in US A by The Riverside Press Cambridge M 
Bound in Great Britain by Hazell Watson and Viney Ltd Ayle: 


sbury Bucks 


PREFACE 


This book and the accompanying study manual were designed strictly 
as teaching instruments or learning aids for use in a first course in statistical 
methods. The orientation is toward psychology and education. А fairly 
adequate notion of the topical coverage can be acquired by skimming the 
detailed table of contents. The general nature of this book and study man- 
ual is described in the introductory chapter (see particularly the first three 
sections) where it is most likely to be read by the student. 

Courses in statistical methods have been regarded as exceedingly difficult 
by a substantial number of students—even by many who have achieved 
a high level of success in other aspects of their professional work. This is 
probably due not so much to an inadequate mathematical background as 
to lack of practice in close and rigorous thinking. Such students have never 
learned to pay close attention to precise meanings in their reading, or to 
strive for high precision in the expression of their own ideas. In an effort 
to make their courses more palatable to the student, many teachers of 
statistics have eliminated almost entirely any discussion of mathematical 
bases, have "simplified" the treatment by glossing over underlying assump- 
tions and important qualifications, have provided rule-of-thumb procedures 
in the selection of techniques and the interpretation of results, and have 
emphasized the more easily mastered computational procedures rather 
than the interpretive aspects of the course. In the opinion of the writers, 
these instructional practices serve only to defeat their very purpose. They 
not only make it impossible for the student to acquire any real under- 
standing of the techniques and concepts involved, but also deny him the 
satisfaction which accompanies such understanding and deepen his mys- 
tification and frustration by requiring him to memorize and to use stereo: 
typed procedures which he fully realizes that he does not really understand. 
The result is that in his subsequent use of statistics the student is incapable 


PREFACE у 


of reasoning out for himself what procedures are appropriate in novel 
situations or of exercising critical judgment in the interpretation of vesults 
in such situations. These instructional practices evade the real issue, 
which is that training in the use of precise and rigorous logie is precisely 
what the student most needs, not just a set of half-understood "recipes" 
for use in model situations whose counterparts are rarely found in praetice— 
with the discrepancy more often than not going unrecognized. 

This book represents an effort to make a relatively few basic statistical 
concepts and techniques genuinely meaningful to the student, through a 
reasonably rigorous developmental treatment that may be readily under- 
stood by the student and which will hold his interest. It is not intended 
as a general reference. book, nor does it include materials for advanced 
courses. Instead, a relatively small number of basic statistical techniques 
and concepts have been developed much more thoroughly and systemati- 
cally than is customary in texts with a wider topical coverage. Recognizing 
that many students have poor mathematical backgrounds and are unac- 
customed to the use of precise and rigorous logic, this book attempts to pro- 
vide the needed experience in such reasoning, and to develop all necessary 
concepts from “scratch,” taking no more for granted in the student’s previ- 
ous mathematical training than some facility with first-year high school 
algebra or general mathematics. The result is а much longer book in rela- 
tion to the topics covered than typifies elementary texts in this field, but 
it is hoped that the expanded treatment will enable the student to master 
the concepts in less rather than in more time. 

Many students in a first course in statistics are prone to take a passive 
attitude in the learning process. Upon meeting concepts they do not 
readily understand, they often resort to the memorization of stereotyped 
interpretations rather than to a persistent and aggressive effort to discover 
underlying meanings. The primary purpose of the study manual accom- 
panying this text is to induce the student to assume a more active and 
aggressive role in learning. The manual is designed to lead the student 
to discover—or rediscover—for himself many of the important properties 
of the techniques considered in the text. In it an effort has been made to 
apply the Socratic method to reinforce the textbook presentations by using 
series of leading questions or exercises which will educe many important 
we himself. To a certain extent the manual is 
xs WON ty peel ii f ne same concepts in another context—in more 
еуин у mus m prov ided in the text. It also provides the 
beum акы. А = ing on his understanding and mastery of the 

s. ort has been made in these exercises to reduce 


"See ы diffieulties to a minimum, and to emphasize interpreta- 
lonal aspects as much as possible. 


The question may occur to s 


ome readers wl " thi 8 
are to be regarded as a sar s whether this book and manual 


1510П of an earlier set of teaching materials pre- 


PREFACE 


pared by one of the present authors.* The decision to prepare this book and 
manual did grow out of the need for a revision of these earlier materials. 
It was decided at the outset, however, to provide a new and different 
treatment in the text, rather than simply revise the earlier book. The 
study manual, on the other hand, may fairly be regarded as a revision of 
its predecessor. Many of the exercises used are based upon those appearing 
in the old manual. 

In a perhaps rather stubborn resistance to trend in statistical methods 
books, we have defined the variance of a sample as the sum of squares 
divided by № rather than by N — 1. The only justification of which we 
are aware for the latter practice is that certain formulas assume a slightly 
simpler form. It seems important to us that as early as possible the student 
be introduced to the distinction between a sample fact (statistic), a popula- 
tion fact (parameter), and a sample estimate of the latter. These concepts 
are basic in sampling theory, there being no practical way in which the 
latter (sample estimate) can in all situations be eliminated by the device 
of defining the statistic as the estimator. Not only does the "best" estimate 
vary with definition of "best," but, in the case of some parameters, with 
the form of the population distribution as well. The gain in the simplicity 
with which certain formulas may be stated seems to us to be too great a 
price to pay for the loss of one of the best elementary examples of the 
very distinetion we feel it important to make, not to mention the problem 
of confronting the student with a definition of variance, the logic of which 
he is at the time unprepared to appreciate. The many writers who have 
defined sample variance as the unbiased population estimate have, for the 
most part, used the symbol s? to represent this value. In keeping with 
the practice of using Greek letters to represent parameters and Roman 
letters to represent statistics, we should have liked to use this symbol to 
represent the sample variance as we defined it and the symbol 8? to repre- 
sent the unbiased estimate. However, in deference to the student who, 
Upon turning to another book might misinterpret the meaning of the s? 
he reads there, we requested our publisher to use some distinctive ess, 
not Greek, in representing sample variance as defined in this book. The 
character selected was the German final ess (8). It is suggested that in- 
Structors in presenting material at the blackboard use either the more 
easily written lower-case script ess or the conventional Roman ess in the 
sense in which we have used the German ess throughout the text. 

Our goal of a full detailed presentation has led to a long book in spite of 
the restriction placed on topical coverage. W e do not believe it to be too 
long for a beginning one-semester course, meeting three or four times per 
week, since its length derives from the detail of presentation rather than 


*E. F. Lindquist, A First Course in Statistics and Study Manual for А First Course in 
Statistics (Boston: Houghton Mifflin Company, 1938; rev. ed., 1942). 


: 
PREFACE vi 


from the multiplicity of concepts treated. For a strictly minimal course it 
may contain more than can be properly covered. Teachers responsible for 
such minimal courses will, if they desire to use these materials, find it nec 


sary to either omit certain sections of the book and manual or to make 
them optional with the student. Such teachers will, of course, wish to decide 
for themselves precisely which topics should be so treated. However, we 
suggest for consideration the following sections (sections bear the same 
numbers in both book and manual): 3.9, 3.10, 3.11, 3.12, 5.17, 7.10, 89, 
8.10, 8.12, 8.13, 10.15, 10.22, 13.9, 14.10, 15.6, 1 ,and 15.9. In addi- 
tion, we suggest for the minimal course the possibility of omitting some or 
even all of the formal proofs or derivations provided in the text. 

A special word of explanation is needed regarding Chapter 3. In. this 
chapter we have defined the various schemes for the symbolic representation 
of numerical data which are used throughout the book. We had some 
slight preference for organizing this material into a unit so that if desired 
it could be assigned or presented as such. We recognize that many teachers 
may prefer not to present material of this type as a unit. Where this is the 
case we simply suggest the omission of the chapter as a chapter and the 
subsequent. individual assignment of the sections which comprise it as the 
need for them first arises. 

It is impossible in a book of this type to make proper acknowledgment 
of the multitude of sources out of which it developed. What former 
teachers, what writers, what books or articles, what former students led 
us to adopt this or that mode of presentation is no longer possible for us 
to say, but to all of them we owe a debt of gratitude. Specifically, we are 
deeply indebted to Professor David A. Grant, of the University of Wiscon- 
sin, who read the entire manuscript 
sistance in the final revision. 
Leonard S. Feldt of the State 
to try out much of the materi 
great assistance. 

Finally, we are indebted to Professor 
to Dr. Frank Yates, Rothamsted 


and whose criticisms were of great as- 
We are also deeply indebted to Professor 
University of Iowa, who used his clas 
al and whose valuable suggestions were of 


SCS 


Sir Ronald A. Fisher, Cambridge, 
о ] , and to Messrs. Oliver and Boyd Ltd., 
Edinburgh, for permission to reprint. parts of Table III from their book, 
Statistical Tables for Biological, Agricultural, and Medical Research; to 
Cambridge University Press for their permission to reprint Tables 1 and 
12 from E. S. Pearson and EC. Hartley, eds., Biometrika Tables for Statisti- 
cians; and to the Iowa State College Press for their permission to reprint 
the table of random numbers from George W. Snedecor, Statistical Methods. 


Гота City, Towa Pau, BLOMMERS 
February 1959 E. FI хоке) 
2. F. LINDQUIST 


< 


PREFACE 


CONTENTS 


CHAPTER ONE INTRODUCTION 
1.1 The General Nature of Statistical Methods 
1:2 The Major Aspects of Instruction in Statistics 
1.3 The Nature of This Book and the Accompanying Study 
Manual 
14 Studying Statistics 
CHAPTER TWO THE FREQUENCY DISTRIBUTION 
2.1 Introduction 
2.2 Continuous Data: Effect Upon Class Limits 
2:3. Graphical Representation of Frequency Distributions 
24 Selecting the Classes: Generalizing About the Form of a 


Distribution 


2.5 Selecting the Classes: Computation 

2.6 Selecting the Classes: Scores Concentrated at Equally Spaced 
Points 

2:7 Selecting the Classes: Markedly Skewed Distributions 

28 Selecting the Classes: Summary Remarks 

2.0 Classifying the Measures and Reporting the Frequencies 

2.10 Graphical Comparison of the Variability of Two Frequency 
Distributions 


CHAPTER THREE SYMBOLIC REPRESENTATION OF DATA 


3.1 Introduction 
3.2 The Representation of Any Collection of Measures or Scores 
38 Expressing Computational Results in Terms of the Notational 


Scheme of Section 3.2 


CONTENTS 


34 A Scheme for Representing Any Frequency Distribution 50 


3.5 Computation in Terms of the Frequency-Distribution Nota- 
tional Scheme 51 
3.6 Some Simple Rules Regarding the Summation Operator 82 
3.7 Representation of a Relative Frequency Distribution 55 

3.8 Computational Results in Terms of the Relative Frequency 
Distribution Notational Scheme 55 

3.9 A Scheme for Representing a Collection of Scores Organized 
Into Subgroups or Subsets of Scores 56 
3.10 Computational Results in Terms of the Scheme of Section 3.9 58 

3.11 The Situation in Which Two or More Measures Are Available 
for Each Individual 60 

3.12 Computational Results in Terms of the Notational Scheme of 
Section 3.11 6l 
3.13 Remarks on Statistical Notation 64 

CHAPTER FOUR PERCENTILE RANKS AND PERCENTILES 
41 Introduction: Rank-Order Scales 65 
4.2 Percentile Ranks: Definition 68 
4.3 Percentiles: Definition 70 
4.4 Notation and Special Percentiles Defined 70 

4.5 Computation of Percentile Ranks Corresponding to Test 
Scores 72 
4.6 Computation of Percentile Ranks from Grouped Data 74 
4.7 Computation of Percentiles 76 
4.8 Indeterminate Percentiles 79 

4.0 The Use of the Ogive in Estimating Percentile Ranks and Per- 
centiles 79 
4.10 Population Percentile Ranks and Percentiles 83 
4.11 Overlapping Distributions 85 
4.12 Distances Between Special Percentile Points 89 

4.13 Distances Between Special Percentile Points as an Indication 
of Variation Among Measures 94 
4.14 Comparison of Percentile Ranks Derived for Different Groups 95 

CHAPTER FIVE AVERAGES: INDEXES OF LOCATION 

5.1 Introduction: Average as a General Term 98 
5.2 Mode Defined 99 
5.3 Median Defined 101 
54 The Arithmetic Mean Defined 101 
5.5 Computing the Mean 103 
5.6 Grouping Error in the Mode 108 
5.7 Some Simple Rules Regarding the Mean 108 
58 A Property of the Mean 114 
59 A Property of the Median 116 


(CONTENDS 


Selection of an Average: Representing the Typical Score of a 


5.10 
Unimodal Distribution Containing Extreme Scores 

5.11 Selection of an Average: Interest Centered on Total Rather 
than Typical 

5.12 Selection of an Average: Case of Multimodal Distributions 

5.13 Selection of an Average: Expected Value 

5.14 Selection of an Average: Summary 

› 5.15 Joint Use of Averages 

5.16 Grouping Error in the Median 

5.17 Minimum Information Needed To Determine the Median of a 
Grouped Frequency Distribution 

CHAPTER SIX MEASURES OF VARIABILITY 

6.1 Introduction 

6.2 The Range Type of Index 

6.3 The Deviation Type of Index 

6.4 Computation of Variance and Standard Deviation 

6.5 Some Simple Rules Regarding the Variance 

6.6 Comparison of Q and & 

6.7 Uses of Measures of Variability: Comparing Variability 

6.8 Uses of Measures of Variability: Reliability of Measurement 
or Estimate 

CHAPTER SEVEN STANDARD SCORES 

7.1 Introduction 

7.2 The Concept of Standard Scores 

7.3 Transforming Scores into Standard Form 

7.4 z-Scores as Linear Transformations of the X-Scores 

7.5 Some Properties of the z-Scale 

7.6 Other Systems of Standard Scores 

ne An Example Comparing the X-, z-, and Z-Scales 

7.8 Interpreting Standard Scores Derived for Different, Reference 
Groups 

7.9 Interpreting Standard Scores Derived from Different Raw- 
Score Scales 

7.10 Test-Battery Composite Scores 


CHAPTER EIGHT 


8.1 
8.2 
8.3 


(CONTENTS 


THE NORMAL CURVE 


Introduction 
The Normal Curve Defined 
Some Properties of the Normal Curve 


157 
157 
158 
160 
162 

163 
166 


168 


170 
172 


177 


184 


CHAPTER NINE 


8.6 


Tables of Ordinates and Areas for the Normal Curve (8.3) 
Probability 

The Concept of Probability as Applied to the Outcome of an 
Uncertain Event 

The Normal Curve as a Probability-Distribution Model 
Examples Showing How Tables May Be Used to Obtain 
Various Facts About Normal Distributions or Normal Proba- 
bility Distributions 

Interpolation 

Fitting a Normal Curve to an Observed Frequency Distri- 
bution 

The Lack of Generality of the Normal Curve as a Distribution 
Model 

The Area Transformation (7'-Scores) 

Assigning Letter Grades 


INTRODUCTION TO SAMPLING TIIEORY 


9.1 The General Nature of Sampling Studies 

9.2 Definitions and Basie Concepts of Sampling-Error Theory 

9.3 Selecting the Sample 

9.4 Sampling Theory as It Applies to the Means of Random 
Samples 

9.5 Sampling Theory as It Applies to the Median of Random 
Samples 

9.6 Sampling Theory as It Applies to a Proportion 

9.7 Sampling Theory as It Applies to Differences Between Two 
Normally Distributed Random Variables 

98 Approximating Descriptions of Sampling Distributions 

CHAPTER TEN TESTING STATISTICAL HYPOTHESES 

10.1 The Notion of Indirect Proof 

10.2 Testing Statistical Hypotheses: Introductory Remarks 

10.3 The Problem of the Principal and the Superintendent 

10.4 The Problem of the Principal and the Superintendent: 
Solution I 

10.5 The Problem of the Principal and the Superintendent: 
A Modification of Solution I 

10.6 The Problem of the Principal and the Superintendent: 
Solution IT 

10.7 The Problem of the Principal and the Superintendent: 
Solution III 

10.8 The Problem of the Principal and the 


Solution IV Superintendent: 


CONTENTS 


10.9 The Problem of the Principal and the Superintendent: 


Solution V 278 

10.10 The Problem of the Principal and the Superintendent: 
Solution VI 279 
10.11 Choosing the Level of Significance: The Two Types of Error 281 
10.12 Controlling Type II Errors 284 
10.13 The Power of a Statistical Test 287 
^ 10.14 The Arbitrary Aspects of Statistical Tests: A Summary 296 
10.15 Estimating Sample Size 207 
10.16 A Psychological Problem 301 
10.17 A Psychological Problem: ` Experiment I 302 
10.18 Some Possible Explanations of the Result Keo aet 308 
10.19 A Psychological Problem: Experiment IT 310 
10.20 Reporting the Extreme Area 312 
10.21 A Psychological Problem: Experiment III 313 
10.22 A Problem Involving the Comparison of Two Proportions 319 

CHAPTER ELEVEN INTERVAL ESTIMATION 

11.1 Introduction 321 
11.2 Introduction to the Concept of a Confidence Interval 323 
11.3 Definition of a 100y Per Cent Confidence Interval 325 
11.4 The 100% Per Cent Confidence Interval for a Population Mean 327 

11.5 The 1007 Per Cent Confidence Interval for the Median of a 
Normally Distributed Population 330 

11.6 The 1007 Per Cent Confidence Interval for a Population Pro- 
portion 331 

ИЛ The 100y Per Cent Confidence Interval for the Difference Be- 
tween the Means of Two Populations 332 

CHAPTER TWELVE SOME SMALL-SAMPLE THEORY AND 

ITS APPLICATION 

12:1 Introduction 335 
12.2 A New Interpretation of an Old Test Statistic 337 
123 The (-Statistie and Its Sampling Distribution 338 
12.4 Degrees of Freedom 340 
12.5 Tables of Areas for t-Curves 341 

12.6 The Use of t as a Test Statistic To Test a Hypothesis About the 
Mean of a Normally Distributed Population 343 

197 The Use of t as a Test Statistic To Test the Hypothesis of No 

Difference Between the Means of Two Normally. Distributed 
Populations . 346 
12.8 Concluding Remarks on the Use of ё as a Test Statistic 353 
12.9 Interval Estimation Based on the {Statistic 357 


CONTENTS xiii 


CHAPTER THIRTEEN CORRELATION 


13.1 
13.2 
13.3 
13.4 
13.5 
13.6 
13.7 
13.8 
13.9 
13.10 


13.11 
13.12 


14.1 
14.2 


14.3 


14.4 

14.5 

14.6 
14.7 


148 


14.9 
14.10 
14.11 


15.1 

15.2 
15.3 
15.4 
15.5 
15.6 
15.7 
15.8 
15.9 
15.10 


Introduction to the Concept of Correlation 
The Scatter Diagram 

The Bivariate Frequency Distribution 

An Index of Correlation 

Some Properties of r 

Linear and Curvilinear Types of Correlation 


361 
363 
370 
372 
374 


377 


Effect of Curvilinearity Upon the Mean z-Score Product, 7 386 


The Calculation of r from the Original Score Values 388 
The Caleulation of r from a Bivariate Table 391 
Influence of the Variability of the Measures Upon the Mag- 
nitude of r 400 
Remarks Regarding the Meaning of a Given Value of 7 402 
Causal Versus Casual or Concomitant Relationship 404 
CHAPTER FOURTEEN THE PREDICTION PROBLEM 
Statement of the Problem 407 
A Possible Solution to the Prediction Problem and Its Weak- 
nesses 408 
A Preferable Solution to the Prediction Problem in a Special 
Case: Linear Prediction 410 
Fitting a Prediction Line by the Method of Least Squares 412 
The Problem of the High School Counselor 416 
Other Forms of the Prediction Equation 419 
The Accuracy of Prediction: The Correlation Coefficient as 
p^ Index 423 
— : t; of Prediction: The Standard Error of Estimate Е 
The Concept of Regression а 
Regression Terminology 430 
Prediction of X, Given Y 441 
ea E 
CHAPTER FIFTEEN SAMPLING-ERROR THEORY FOR LINEAR 
REGRESSION AND CORRELATION 
Introduction: The Regression Model 443 
The Sampling Distributions of b and F 446 
Testing Hypotheses About @ and ик: Examples 448 
Testing the Hypothesis that р= 0 450 
Establishing Confidence Intervals for 8 and ир 452 
The Sampling Distribution of Y ^ 453 
Confidence Intervals for the My-2-Values 454 
Confidence Intervals for Individual Predictions 456 
Cross-Validation 459 
The Normal Bivariate Model 461 


xiv 


CONTENTS 


Fisher’s Logarithmic Transformation of r 

Testing a Non-zero Hypothesis_About a Population Corre- 
lation Coefficient 

Establishing a Confidence Interval for p 

Test of the Hypothesis that Two Normal Bivariate Popu- 
lations Have the Same p-Value 


APPENDIX A GLOSSARY OF SYMBOLS 


APPENDIX B SELECTED FORMULAS AND RULES 


APPENDIX C TABLES 


Table I 
Table IT 
Table ПІ 
Table IV 
Table V 
Table VI 
Table VII 


INDEX 


CONTENTS 


Squares and Square Roots of the Numbers from 1 to 1,000 
Normal Curve Areas and Ordinates 

Table of Normalized T-Scores 

Percentile Rank of a Normalized T-Score 

Ten Thousand Randomly Assorted Digits 

Probability Points of Curves 

Values of z, for Various Values of т 


479 


ху 


ELEMENTARY 


STATISTICAL METHODS 


INTRODUCTION 


1.1 THE GENERAL NATURE OF STATISTICAL METHODS 


Statistical methods are the techniques used to facilitate the interpreta- 
tion of collections of quantitative or numerical data. The variety of things 
that man ean measure or count and thereby generate collections of numeri- 
cal data is virtually unlimited. These measured or counted things (char- 
acteristics, traits, attributes) usually involve groups of individuals or 
objects, although they may also apply to repeated measurements obtained 
fora single individual or object. Consider a few examples. The individuals 
or objects may be the rats in a psychologist’s laboratory, the influenza 
Patients in a certain hospital during a given period of time, the pupils in an 
elementary school classroom, the laborers in a partieular manufacturing 
plant, television tubes of a given size and make, the various types of con- 
tainers in which some food commodity is distributed, and so on almost 
Without end. For the groups of individuals and objects just enumerated, 
there are a number of different counts or measurements in which we might 
be interested: in the case of the rats, for example, we might want to know 
the number of times after a period of conditioning that each animal follows 
a particular path in a Y-maze; in the case of the influenza patients, we 
might be concerned with periodic measurements of body temperature; 
With the elementary school pupils, measurements of reading rate might be 
our chief interest; perhaps we would want to know the laborers’ gross 
annual incomes; the length of life of the television tubes; and in the case of 


3 


INTRODUCTION 


the containers, we might wish to gauge consumer preference as indicated 
by numbers sold during a given period. 

To be of value, such collections of numbers require interpretation. Do 
the numbers derived for one group tend to be larger than the numbers 
derived for another similar or related group? Do they tend to vary more 
in magnitude? Is there anything abnormal about them when compared 
with similar numbers derived for some base or reference group? These and 
many other questions may need to be answered. Statistical methods are 
the techniques which are used in the attempt to arrive at the roquired 
answers. 

The orientation of this book is toward the fields of psychology and 
education. This means primarily that the examples used to make the 
material concrete have, for the most part, been drawn from psychology and 
education. Books on statistical methods are oriented toward a variety of 
fields such as business, economics, sociology, medicine, agriculture, and 
biology, in addition to psychology and education. Some statistical tech- 
niques are of much greater importance in some fields of application than 
in others, and in some instances a technique may even be more or less 
unique to a given field of application. But for the most part, statistical 
techniques are of general applicability and the student who masters them 
thoroughly will be able to apply them as well in one area as in another. 
For example, the statistical problems involved in analyzing gains in milk 
production for a collection of cows fed a certain dict are, by and large, the 
same problems encountered in analyzing a collection of learning scores for 
a group of college students participating in a psychological experiment or à 
group of school children engaged in le. 
particular method of instruction. 

Statistical techniques may be classified in different ways. One scheme 
Which has proved helpful in bringing to the à 


beginning student some general 
overview of the subject is the three-category classification of: (1) deserip- 
tive statistics, (2) statistics 


ul inference, and (3) prediction or regression. A 
few words should be said about the types of technique which fall into each 
of these categories. 


(1) It is obviously difficult, if not impossible, to glean pertinent facts 
from a large, unorganized collection of numerical data. Ways must be 
found to organize the data into some orderly form, to make summary state- 
ments about the general (average) level of magnitude of the numbers in- 
volved, to indicate in some way the extent to which these numbers tend 
to be alike or different in magnitude, and to show how they are distributed 
in value—that is, whether they are mostly small except for a few that are 
large, or whether they are mostly of medium size except for a few that are 
large and a few that are small, and so on. Techniques which help to indi- 
cate such facts as these regarding a large collection of numbers are descrip- 
tive in character and fall into the category of descriptive statistics. 


ürning some school subject by a 


4 


INTRODUCTION 


Still another type of descriptive statistic has to do with a somewhat 
different kind of collection of numerical data. This collection consists of 
pairs of measures for each member of a group of individuals, such as heights 
and weights for each of a large number of ten-year-old boys. We know 
from casual observation that some relationship exists between height and 
weight scores for the same boy. That is, we know there is a tendency for 
boys who are tall to weigh more than boys who are short. But we also can 
cail to mind such exceptions as the “tall and thin” boy or the “short and 
fat” one. Just how strong is this tendency toward relationship? Tech- 
niques for assessing the degree of relationship in situations such as this 
also fall within the category of descriptive statistics. 

(2) Many research studies are of a type known as sampling studies. In 
such studies relatively small groups of individuals selected from larger 
groups are observed, investigated, or treated experimentally. From the 
results derived from these small groups (samples), inferences are drawn 
about the large groups (populations). In any such study there is always 
the possibility that the sample of individuals used may not be truly repre- 
sentative of the population, since chance factors beyond the investigator's 
control will always determine, to some extent, which individuals constitute 
the sample employed. Hence, any fact derived from a sample must always 
be considered as only an approximation to the corresponding "true" fact — 
that is, the fact which would have been obtained had the entire population 
been studied. Under certain conditions of sampling, statistical techniques 
are available which enable an investigator to determine what to expect by 
way of error in the inferences he makes about population facts from examin- 
ing corresponding sample facts. Such techniques represent a very important 
aspect of statistical methodology and belong, of course, to the category of 
Statistical inference. ne 

(3) Finally, suppose that for a large group of individuals we have some 
knowledge of the relationship between a variable Y and some other variable 
X. For example, Y might represent some measure of success as a college 
Student and the other variable, X, some measure of success as a high school 
Student or some measure of general intelligence or scholastic aptitude. 
Now suppose that we are confronted with some new individuals for whom 
only the X measure is currently available and that we are required to make 
for them some estimate or prediction of Y —in this instance, of success as a 
college student. The prediction problem consists in using the measure 
currently available, together with our knowledge based on previous experi- 
ence with the relationship between the two variables involved, to make 
the best possible estimate of how these new individuals will perform in 
terms of the Y variable. The statistical methods designed to cope with 
this problem fall into the category known as prediction or regression. 

Elementary techniques representative of each of these categorics are 


Presented in this text. 


INTRODUCTION 5 


1.2 THE MAJOR Aspects OF INSTRUCTION IN STATISTICS 


Entirely apart from the major purposes of statistics as categorized in 
the preceding section, there are three aspects of statistics which ще pus 
variously stressed in introductory books on the subject. One of these s 
to do with the mathematical theory underlying the techniques. A SECON 
has to do with the computational procedures involved in the application 
of the techniques. And a third has to do with the selection ol tertiigues 
most appropriate for a given purpose and set of data, and with the inter- 
pretation of the results. Е 

The foundation of statistical methods is provided by mathematics. 
The mathematical theory of statistics has, in fact, achieved recognition as 
an area of specialization in the general field of higher mathematics. е 
longer is it possible to qualify аз a statistical expert and be relatively 
ignorant mathematically. It remains possible, nevertheless, to acquire 
some very useful information regarding the application and interpretations 
of certain important statistical techniques without studying their mathe- 
matical bases. In this book the mathematical bases requiring a background 
of more than a year or two of secondary school mathematics have been 
omitted in an effort to make the text understandable and the techniques 
available even to students having quite meager mathematical training. 
It is not to be inferred, however, that the treatment is wholly non-mathe- 
matical. The foundation of statisties is mathematics, and to divest а 
presentation of all mathematical aspects would amount to “short-changing 
the student. It would leave him ignorant of much of the logic underlying 
the techniques he is seeking to master, and would render him incapable of 
critical interpretation. It would also handicap him in any attempt he 
might make to pursue the study of statistics beyond a most elementary 
beginning and would leave him quite incapable of consulting many valuable 
statistical references. This book, therefore, does not avoid all that is 
mathematical, but it does require by way of background only that degree 
of mathematical sophistication which it is reason 
most meagerly equipped college student.* 

The second aspect, that having to do with computational procedures, 
is also given rather cursory treatment in this volume. A great variety of 
such procedures have been developed, including many which involve the 
use of special desk calculators, electric punch-card equipment and, more 
recently, high-speed electronic computors. These procedures are so varie 
and often so complex that early consideration of them would only confuse 
the beginning student and interfere with his attainment of a real under- 
standing of the principles underlying the techniques. In this book only 
the most essential, straightforward, and readily understandable compu- 


able to expect of even the 


"The equivalent of one year of s econdary school mathematics 8 a reasonable ma! 
а абс а 
üematies plu 


6 


INTRODUCTION 


tational procedures will be considered. The descriptions of these pro- 
cedures, moreover, are given not so much for the purpose of developing 
computational skill and facility as for the purpose of contributing toward 
a fuller understanding of the techniques themselves. 

The emphasis in this book, then, is on the third aspect—that is, on 
developing a knowledge of the appropriate technique to select for a given 
purpose and a given set of data, and on the critical interpretation of results. 
For each of the techniques considered, major emphasis will be placed upon 


such questions as: 

What, within the limits of the mathematical background assumed, are the 
most significant mathematical properties and characteristics of the technique? 
What assumptions are involved in applying it? 

What specifie uses may be made of it? In what types of situations is its 
application valid? 

What are its major advantages and limitations in comparison with other 
techniques intended for roughly the same purposes? 

How may the results of its application be interpreted? How must this in- 
terpretation be qualified in the light of considerations that may be unique to 
the particular application? 

What common misinterpretations are to be avoided? What common fallacies 
in statistical thinking are related to the use of this technique? 


In short, this book has to do essentially with the interpretation of 
statistical techniques. The mathematical theory of statistics and the 
mechanics of computation are minimized as much as is consistent with this 
Major purpose. There are a number of reasons for this distribution of 
emphasis. One is that the typical student in a first course in statistics is 
not likely to be engaged in any significant amount of research. Neverthe- 
less, while he may not be an immediate user of statistical techniques, he is 
almost certain to be a consumer of the uses made by others. Certainly, if 
he is to attain any real insight into the problems of his field, if he is to 
inform and keep himself informed about the current research investigations 
and experiments, he must be prepared to read the periodical literature with 
understanding. If only as preparation for such reading, training in statisties 
is an essential part of every student’s equipment. Without such training 
much of what he should read professionally will be rendered unintelligible 
by the frequent recurrence of such statistical terms as variance, standard 
deviation, standard error, critical region, level of significance, confidence inter- 
val, errors of the first and second kind, correlation coefficient, regn ton coeffi- 
cient, statistical significance, etc. To read such material with comprehen- 
Sion, the student need have no special skill in computational procedures, but 


he must be prepared to evaluate critically the uses that have been made of 
others, and must be able to check their conclusions 


Statistical techniques by 
against his own interpretations of the results reported. For the few осса- 


sions in which students at this level may need to apply statistical techniques 


7 


INTRODUCTION 


themselves, either the limited computational procedures deseribed in this 
volume will suffice or directions for the preferred procedures ean be readily 
found in references and handbooks. The student who has achieved an 
understanding of the essential nature of a technique will have no difficulty 
in following such directions in these sources. As students progress to а 
point where they may become engaged in more extensive research of their 
own, they will in any event find it necessary to proceed to advanced courses 
in statistics in which the more economical computational procedures in- 
volved in large-scale research may be considered at greater length. 


13 Tue NATURE or Tris Book AND THE ACCOMPANYING STUDY MANUAL 


This is a long book, yet it treats only the elementary statistical tech- 
niques. Many statistics books which are no greater in length have a much 
wider topical coverage. Such books are usually intended to serve in a dual 
capacity as both teaching instruments and general reference books. Be- 
cause of practical limitations of space, authors of such books frequently 
find it necessary, in order to achieve the topical coverage demanded by а 
general reference work, to give rather cursory treatment to many of the 
topics. This book makes no pretense of serving the general reference func- 
tion. It was written solely as an instructional tool. Its length derives 
primarily from an attempt to provide a genuinely complete and detailed 
presentation of only such elementary st 
as might be regarded appropriate for consideration in an introductory 
course. In deference to the presumed lack of mathematical background of 
many potential users, the "spelled-out" accounts of the techniques and 
concepts are presented largely in words rather than symbols, a practice 
which makes for a still longer book. It is believed, however, that the serious 
student who will patiently study the sometimes rather lengthy presenta- 
tions will find this form of treatment a genuine aid toward a mastery of the 
topics involved. 

Furthermore, this book is only one 
two-way approach to learning 
manual containing problems 
the student to rediscover 


atistical techniques and concepts 


part of what is intended to be a 
statistics, Accompanying the book is a study 
and questions of a character designed to assi 
for himself many of the significant properties. 
aspects, and underlying assumptions of the concepts 4 
sented in the text. These problems and questions are organized by chapters. 
and within chapters, in such a way as to follow much the ; 
presentation as the text itself. They will suggest a ] 
variety of concrete situat ions, illustr. 


sl 


and techniques pre- 


same sequente of 

arge number and 

вени; : ating the uses and limitations of cach 
dn э 4 "^. > . 

echnique; and will draw attention as well to the effect upon the interpre- 


fation of results of the basie assumptions underlying the derivation of the 
E The student, by developing these illustrations and by formu- 
ating and stating in his own words the generalizations which they support, 


8 


INTRODUCTION 


will in a sense develop a second text of his own writing which will contain 
many of the important principles and concepts of the original book. Prop- 
erly used, then, this study manual will provide a second presentation of at 
least some of the major concepts of the book. 

A special effort has been made in both book and manual to develop in 
the student a critical attitude toward the use of statistical techniques. 
Special stress has been placed upon the limitations of each technique, upon 
the"frequent and unavoidable failure of many practical situations to satisfy 
all the basie assumptions or requirements of each technique, upon the 
manner in which conclusions must be qualified because of such failure 
and upon prevalent misconceptions and fallacies in statistical reasoning. 
In a misguided effort to simplify statisties many of these necessary qualifi- 


cations have often been ignored in instruction, and the student has been 
provided with a number of rule-of-thumb procedures and stereotyped 
interpretations which, because of the numerous exceptions to them, get 
him into more difficulties in the long run than they help him to avoid. 
Statistical techniques are an aid to, not a substitute for, common sense. 
ain purpose and for use under certain 


Each technique is designed for a ce 
conditions only. When these conditions are not satisfied, the application 
of the technique may and often does lead to conclusions that are obviously 
contradictory to common sense. It is because of such abuses of statistical 
techniques that people have developed a distrust of statistics and statisti- 
cians. In using these instructional materials, then, the student should 
strive consciously to develop in himself a highly critical attitude toward 
statisties and to be constantly vigilant against the tendency to overgeneral- 
ize or to depend unduly upon stereotyped interpretations. 


TICS 


1.4 STUDYING STAT 
Many students will undoubtedly be inexperienced in reading material 
of the type represented by certain sections of this book. Statistics has to 
do with the analysis of numerical data. Obviously, then, the ideas, con- 
cepts, and techniques involved will be quantitative in nature. Since the 
Most efficient method of presenting or dealing with quantitative concepts 
is through the use of symbols, it follows that the present exposition will 
become at times rather heavily symbolic. Relatively few of the students 
Most likely to use this book will be experienced in reading materials that 
deal primarily with quantitative concepts, and fewer still will be practiced 
in reading material that involves much use of symbolic expression. 
Perhaps the thing that most discourages the unpracticed reader of 
Materials of this type is the failure to achieve full comprehension on a first 
Or even a second reading. Many students are accustomed. to covering 
reading assignments with a single reading carried on at a rate of 30 to 40 or 
More pages an hour. To encounter reading material which requires valer 


9 


INTRODUCTION 


taking study—which indeed may require several readings—-is for them a 
new experience. Unaware that such material often does not come easily 
even to the most practiced reader, they conclude that the material is beyond 
their reach and give up in their attempts before they are actually well 
started. They capitulate without attempting to learn not because of an 
unwillingness to make the attempt but rather bec 
what the attempt involves. 

Possibly the best advice that can be offered to the beginning stuaent 
of statistics is to slow down. The student must approach the subject 
knowing that mastery is not likely to be achieved as the result of 2 single 
reading. In studying this book, it is not a bad idea to have a pencil and 
scratch paper at hand. One of the best ways to check one's understanding 
of a concept is to verify the results of the illustrative examples. Further- 
more, because of the enormous amount of condensation achieved by the 
use of mathematical symbols, it is always possible, in reading a given 
formula or symbolic expression, to overlook some crucial notation; writing 
the formula down on paper is a good way to fix each element in mind, From 
time to time the student may find it helpful to outline the steps in his own 
reasoning about a concept or to sketch a diagram or figure as an aid to his 
own thinking. He may also find it helpful to develop his own glossary of 
statistical terminology and to write his own summaries of the ideas studied. 
Such note-taking procedures can prove to be a highly efficient form of 
“re-reading.” 

In the same way, use of the study manual should be most helpful. The 
questions and problems in the manual follow the same sectional organiza- 
tion as the text itself. They are designed to lead the student to discover for 
himself, independently of the text, at least some of the most important 
ideas presented in the text. At the same time, they provide a check on the 
student’s mastery of the exposition in the text. 
that the student will feel he has fully unde 
when actually his underst 


ause they fail to realize 


It will sometimes happen 


of checking how adequately 
lave been grasped. 

ging to the novice. However 
romise—a promise that if the 
patience, realizing that others 
solved them, he will eventually 


d ; ф this should not discourage him; no 
c atistician trusts his memory in such matters. The important point 
is that once the student has achieved an understanding of the general 


эрес and underlying assumptions of the Statistical techniques presented 
ч on formulas and computational routines will all fit into a logical 
Whole, and statistics will become for him not a mysterious jumble of symbols 


10 


INTRODUCTION 


and numbers 


cluttering up the pages of learned articles and books but 
rather an instrument for organizing and deepening his perception of the 
infinitely various collections of enumerated data with which he will con- 
tinue to be confronted throughout his personal and professional life. 


INTRODUCTION 11 


THE FREQUENCY 
DISTRIBUTION 


2.1 INTRODUCTION 


Anyone who has worked with a large mass of numerical data knows 
that it is extremely difficult to make sense out of the individual numbers 
in the unordered form in which they were originally collected. ‘Table 2.1, 
for example, contains the scores made by 100 high school pupils on a 200- 
word spelling test. Each score represents the number of words correctly 
spelled. 


TABLE 2 1 


Scores of 100 High School Pupils on a 200-Word 
Spelling T'est 


en B6 ва о 97 1 174 105 133 139 
188 93 12 123 i06 gs 05 80 93 63 
55 p E od 88 112 170 87 154 120 
vid ; 31 1% à ' É an 121 
їй о m у и 80 — 9092 109 — 138 


89 — 146 i 73 15 

m 146 ж — 94 10 pa a т i34 190 
m E. iz T9 1% 153 12 išp 132 65 
i i ы © 9 10 $9; m 108 1% 
3 38 93 — 128 92 98 108 112 67 68 
45 86 12 в i 76 157 0 134 96 


12 


"ох 
THE FREQUENCY DISTRIBUT! 


While these 100 scores certainly do not constitute a very large mass of 
numerical data, it is nevertheless obviously impossible to hold even this 
number of scores in mind at once. To make any generalization about group 
performance from a quick inspection of these scores is extremely difficult. 
Certain characteristics of the group can, of course, be noted. It is not diffi- 
cult to determine that no pupil made a perfect score, that “relatively few” 
pupils spelled more than 150 words correctly, that every pupil spelled some 
words correctly, that a "good many" of the pupils scored between 110 
and 150, ete., but such statements do not constitute a description of the 
r very useful or accurate, nor do they provide 
for the evaluation of the performance of any individual 
ation to the performances of the other members. 
To add very much to the precision and meaningfulness or usefulness of this 
deseription would require : painstaking and tedious “hunt-and-count” 
process. Through such a process it ix possible to find the lowest and highest 
scores in Table 2.1, or to determine exactly how many pupils scored above 

or to find the exact number of pupils who scored 
con any other pair of values, or to identify the 
ore, ete. The student has only to try to do 
these things for himself to discover how time-consuming the process is, how 
inaccurate it is likely to be. and how inadequate it is, after all, for the pur- 
pose of providing à composite mental pieture of the group performance. 

Here we shall illustrate the “hunt-and-count” process only as it might 
be applied to evaluate the performance of a pupil scoring 150 in relation to 
the performances of the other pupils in the group. W hile more detailed 
information might be desirable in making the evaluation, it is at least essen- 
tial to know how ers of the group made scores on this test 
which were higher the scores of 150, and how many 


made scores below 150. 
all scores exceeding 150 and 
150. Although the number of 
subtracting these two counts 
perhaps better, in the interest of асси 


group as а whole that is eithe 
an adequate basis 
member of the group in rel 


150 or any other score, 
between 110 and 150 or betw 
most. frequently occurring s 


many memb 


in 150, how many made 
ud “ ) 
As a first step it 1s necessary to “hunt-and-count” 


аз а second step to “hunt-and-count”’ scores of 
seores below 150 may now be determined by 
from the total number of scores (100), it is 
асу, also to “hunt-and-count” the 
scores below 150. The fact that the sum of these three counts must be 100 
may then be used as a check against the possible overlooking of one or more 
scores, ‘The results of this process are presented in Table 2.2. While many 

it is at least clear that a score of 


Pertinent questions remain unexplored, 
TABLE 2 2 CATEGORIES oA 
Above 150 15 
Relative Value of a Score of 150 150 Ў 
Below 150 M 
100 


THE FREQUENCY DISTRIBUTION 


150 represents a rather high level of performance among the members of 
this particular group since it was exceeded by only 15 per cent of the pupils 
involved. But the highly limited usefulness of this analysis is patently 
clear. It provides no information enabling us to evaluate in a comparable 
way the performance of a pupil making a score other than 150. Nor docs 
it provide a basis for answering other types of questions such as were 
previously suggested. What is needed is some way of classifying or arrang- 
ing the scores so as to facilitate the task of interpreting them as а group 
and in a generally more useful manner. 

One possibility would be to rearrange the scores in order of their size, 
from highest to lowest. With such rearrangement it would be very much 
easier to note the highest and lowest scores, or to count the number of 
scores between two given values, or to evaluate any given score in relation 
to the other scores by noting roughly how far down in the list it occurs, ete. 
Rearrangement of the scores in this manner, however, would not only 
require a considerable amount of time, but would still not enable one to 
note quickly and easily the essential characteristics of the performance of 
the pupils as a group. 


TABLE 2.3 Frequency Distribution of the Spelling Scores of 
Table 2.1 (Intervals of One Unit) 


EV АКЕ EN EAE Ж! ae ae ee ЫН 
191 | 1 168 145 | 1 122 99 76 1 
190 167 144 121 2 98 1 75 1 
189 166 143 120 | 2 97 1 74 
188 165 142 119 9% | 3 73 
187 164 | 1 141 | 1 118 95 | 1 72 
186 163 1 140 117 94 2 71 1 
185 162 139 | 2 116 93 | 3 70 

184 161 138 | 3 15.]| 1 902 | 8 69 

183 160 237 | 9: | aa 91 68 | 2 
182 159 | 2 136 113 90 1 67 1 
181 158 135 12 | 5 89 | 2 66 
180 157 | 1 134 | 2 111 88 1 65 2 
179 | 1 156 | 1 133. | 31 110] 1 87 | 2 64 
178 155 132 | 2 109 | 1 86 1 63 1 
177 154 | 1 131 | 2 108 | 2 85 1 02 
176 153 | 2 | 130 107 84 61 
175 159. r3 129 106 | 1 83 1 60 
174 | 1 151 ОВ | j 105 | 2 82 1 59 
173 150 | 1 127 104 81 58 
172 149 126 | 3 103 | 1 80 1 57 
ПСА p i 148 | 1 125 102: 2 70 1 56 | 2 
170 | 1 147 | 1 124 101] 1 78 

| 169 146 | 3 123 | 4 100 77 


THE FREQUENCY DISTRIBUTION 


A better procedure would be to list in order of size all possible score 
values within the range of all the scores obtained, and then to indicate after 
each score value the number of times it occurs, as has been done in Table 
2.3. It is immediately evident that this form of arrangement markedly 
facilitates interpretation. The more frequently occurring scores stand out 
clearly, as do the segments of the scale in which the scores are most heavily 
concentrated. The total number of scores may be quickly secured simply 
by adding the numbers in the frequency column, and the number of scores 
between any given values can likewise be readily obtained through simple 
addition, ete. But most important is the fact that this form of table shows 
clearly how the scores are distributed along the score scale. The latter ad- 
Vantage would be more evident were the table not so bulky and were the 
scores arranged in a single vertical column (which is the usual practice) in- 
stead of in six separate columns as the limitations of space here necessitated. 

The bulkiness of Table 2.3 is a serious disadvantage. With the scores 
distributed over so wide a range, considerable space is needed to list all 
Possible values, and as the presentation is thus strung out, meaningful 
characteristics of the collection of scores still remain difficult to grasp. 
This fact suggests that the interpretation would be further facilitated if 
Table 2.3 were condensed by indicating the number of scores falling within 


X f X f 

189-191 1 120-122 4 

186-188 0 117-119 0 

ТА D 183-185 0 114-116 1 
‘BLE 2.4 180-182 0 111-113 5 
177-179 1 108-110 4 

Frequency Distribution of 174-176 1 105-107 4 
the Spelling Scores of 171-173 1 102-104 3 
Table 2.1 (Intervals of 168-170 1 99-101 1 
Three Units) 165-167 0 96-98 5 
162-164 2 93-95 6 

159-161 2 90-92 4 

156-158 2 87-89 5 

153-155 3 84-86 9 

150-152 9, 81-83 2 

147-149 2 78-80 2 

144-146 4 75-77 2 

141-143 1 72-74 0 

138-140 5 69-71 1 

135—137 2 66-68 3 

132-134 5 63-65 3 

129-131 2 60-62 0 

126-128 4 57-59 0 

123-125 1 54-56 2 


THE FREQUENCY DISTRIBUTION 


TABLE 2.5 Frequency Distributions of the Spelling Scores of 
Table 2.1 (Intervals of Five, Ten, Twenty, and 
Fifty Units) 


A. INTERVALS or 5 UNITS B. INTERVALS OF 10 Units 
X Ў X p 
190-194 1 190-190 1 
185-189 0 180-189 0 
180-184 0 170-179 4 
175-179 1 160-169 2 
170-174 3 150-159 9 
165-169 0 140-149 T 
160-164 2 130-139 14 
155-159 4 120-129 9 
150-154 5 110-119 7 
145-149 6 100-109 10 
140-144 1 90-99 15 
135-139 T 80-89 10 
130-134 7 70-79 4 
125-129 4 60-69 6 
120-124 5 50-50 2 
115-119 1 
110-114 6 C. INTERVALS or 20 UNITS 
105-109 6 5 
100-104 4 3 f 
95-99 6 180-199 1 
90-94 9 160—179 6 
85-89 7 140-159 16 
80-84 3 120-139 23 
15-79 3 100-119 17 
70-74 1 80-99 25 
65-60 5 60-79 10 
60-64 1 40-59 2 
55-59 2 
D. INTERVALS or 50 UNITS 
E Í 
150-199 16 
100 140 47 
50 90 37 


equal intervals along the score scale 
times each integral value occurred. 
this table each interv 


, instead of indicating the number of 
е с ге This has been done in Table 2.4. In 
al is identified in the X column (X re 
or measures) by the highest and lowest scores in the inte 


ех 


presents seor 
rval, and each 


16 


THE FREQUENCY DISTRIBUTION 


frequency value (f column) indicates the total number of scores contained 
in the corresponding interval. In this table each interval includes three 
units along the score seale—any other size of interval could, of course, 


have been employed. 

Obviously, the degree of compactness in a table of this kind will depend 
upon the size of the interval into which we decide to classify the scores. 
We can secure successive degrees of compactness, for example, by using 
intervals of 5, 10, 20, or 50 units as shown in the four frequency distribu- 
tions of Table 2.5. 

Tables 2.4 and 2.5 differ in one fundamental respect from Table 2.3. In 


Table 2.3 each original score is retained intact, that is, the exact value of 
cach score is indicated. In Tables 2.4 and 2.5, on the other hand, we lose 
es the identity of the original scores. For example, we may 
that there are 15 scores in the interval 
90-99, but we have no way of telling from this table how these 15 scores 
are distributed within the interval itself. We are, therefore, unable to de- 
termine from this distribution the exact frequency of occurrence of any 
single score value. However, we сап now more conveniently derive in a 
general way an adequate idea of how the scores are distributed over the entire 
range. We may note, for example, a tendeney for the scores to cluster or 
to be most heavily concentrated in two rather widely separated intervals, 
namely, 90-99 and 130-139. Moreover, the scores show a tendency to 
diminish in frequency to a minimum or "low-point at the interval midway 
hetween these two, while below and above the frequencies taper off grad- 
and 1 for the extreme intervals. This picture of the 
readily discernible from Distribution B of Table 
“the number of intervals remains too great 
of bulkiness for which Table 2.3 


in varying degre 
read in Distribution B of Table 2 


ually to values of 2 
scores ns n group is more 
2.5 than from Table 2.4, where 
to overeome adequately the disadvantage 1 
Was criticized. On the other hand, intervals may be made too coarse. Thus, 
D of Table 2.5 most of the seores fall into a single interval 
acter of the distribution of scores (that is, the fact that 
ad in two separated intervals) is obscured. In 


1, the more serious the loss of identity of 


in Distribution 
und the bimodal char 
the scores are concentrati 
general, the coarser the interva 


individual scores ix likely to become. — Е | 
of the interval to be used is thus a matter of arbitrary choice, 


nature of the data 


The size 


dependent upon the 1 : | | 
is | | it E upon the kind of interpretations one desires to draw from it. 
5 to be put, 


eseription is desired, if fluctuations in frequeney over 
are to be studied, and if the number of scores 
to permit such detailed study, then the interval 
as three or five, or even a unit interval may be 
If, however, only a very rough picture of the 
eded, an interval as broad as twenty, or even 
D of Table 2.5), may prove quite satisfactory. 


and upon the uses to which the table 


If high precision in d 
small parts of the range 
tabulated is large enough 
used should be as small 
justified as in Table 2.3. 
distribution of scores is ne 
fifty (see Distributions C and 


17 


THE FREQUENCY DISTRIBUTION 


The purpose of the preceding discussion has been to point out briefly 
and simply some of the major purposes, advantages, and limitations of a 
technique for presenting a mass of numerical data which is known as the 
frequency distribution. A frequency distribution may be more or less formally 
defined as a technique for presenting a collection of classified objects in such a 
way as to show the number in each class. The word class as used in this 
definition corresponds to the word znterral as used in the foregoing discus- 
sion, while the word object corresponds to the word score. To classify an 
object is to identify the class to which it belongs. The words object and 
class are somewhat more general in that they extend the scheme to applica- 
tion with qualitative as well as quantitative data. (See Tables 2.6 and 2.7.) 


TABLE 2.6 [Crees f 
ч Жм, Professional 6 
Frequency Distribution of Father's BUSHES 3 
Occupation for a Group of 70 Boys Clerical and Sales 7 
Skilled 22 
Semi-skilled 19 
Unskilled 13 
TOTAL 70 


TABLE 2,7 ME aea 


f 

: ГЕС Superior 12 
Frequency Distribution of Ratings я 22 
of Management by 100 Employces Very Good 21 
Fair 21 

Satisfactory 14 

Barely Satisfactory T 

Unsatisfactory 3 

Torat 100 


The name frequency distribution is clearly appropriate since the seheme 


aie the frequency with which the objects are distributed among the various 
d 


It is clear that a frequency distribution consists of two basic elements: 
(1) the description, identification, or definition of the classes; and (2) the 
frequeney counts associated with each class. Given the definitions of the 
classes, it is usually nothing more than a matter of clerical labor to classify 
and count the scores or objects to determine the f-values. The task of de- 
fining the classes, however, is another matter. It should be clear from the 
foregoing discussion that no general rule concerning the sizes of the intervals 
or classes can be appropriate for all purposes or for all types of data. It is 


18 


THE FREQUENCY DISTRIBUTION 


here, then, that judgment enters, and, as in the case with most situations 
calling for sound judgment, it is here that difficulty begins. It is impossible 
to anticipate all the purposes for which frequency distributions might 
possibly be employed as well as all conceivable types of data which might 
be involved. The ability to arrive at sound judgments can come only with 
training and experience, and neither of these alone can take the place of 
constant alertness for the unusual. In later sections of this chapter we shall 
consider the detailed questions that arise in the construction of frequency 
distributions intended for certain specific uses or involving certain specific 
types of data. At most, these considerations can only serve to assist the 
student in acquiring a sound start in the application of the frequency 


distribution. 


22 (оҳтіхсосѕ Dara: EFFECT Upon Crass LIMITS 


The numerical data dealt with in statistical work may be classified as 
either continuous or discrete. Continuous data arise from the measurement 
of continuous attributes or variables. An attribute, or trait, or character- 
istic, or variable, is said to be continuous if it is possible for the true amount 
of it possessed by an individual or object to correspond to any conceivable 
point or value within at least a range or portion of en unbroken scale. Any 
trait in which individuals may conceivably differ by infinitesimal amounts, 
that is, by amounts approaching zero, is thus a continuous trait. Weights 
or heights of children, for example, may possibly correspond to any con- 
ceivable value within a portion of an uninterrupted scale, and hence are 
examples of continuous variables or attributes. Intelligence, school achieve- 
ment, arithmetic ability, spelling ability, personal adjustment, attitude 
toward racial tolerance, strength, temperature, blood pressure, are further 
examples of continuous variables. м i 

Discrete data, on the other hand, are characterized by gaps in the scale 
—gaps for which no real values may ever be tonnid: Thus, though we hear 
such statements as “the average college man has 2.7 children, we know 
that in reality children come only in discrete E Pu data are 
Usually expressed in whole numbers (tegere) апа s inarily represent 
counts of indivisible entities. Sizes of families, school enrollments, numbers 
of books in various libraries, census enumerations, are examples of discrete 


data. 
It is import 


betw i d disc 
/ween continuous and ае ОЕ Y a ‘ 
trait involved and not of the measurements reported as representing the 


the various objects. Thus, the numbers of words 
h school pupils (see Table 2.1) are regarded as 
h they represent counts of indivisible entities 
ter of gradual continuing growth and 


ant to note that the determining factor in distinguishing 
: rete data is the continuity of the attribute or 


àmounts of it possessed by 
Correctly spelled by 100 hig 
Continuous data even thoug 
because the trait involved is a mat 


19 


THE FREQUENCY DISTRIBUTION 


development, and the true spelling ability of an individual may Paneetvanty 
be regarded as falling at any point along an unbroken seale. Since two 
individuals may differ with respect to a continuous attribute by an in- 
finitesimal amount, and since it is humanly impossible to detect such 
differences, it follows that all measurements of continuous attributes must 
necessarily be approximate in character. It is for this reason that the 
measurements themselves do not provide a basis for distinguishing between 
discrete and continuous data. No matter how precisely we measure, our 
inability to distinguish between points on the seale which are separated by 
infinitesimal amounts implies the inevitable existence of unassignable gaps 
between the very closest measurements we are able to take. In this sense 
our efforts at measurement always lead to dixerete results. But our meas- 
urements are actually only approximations of amounts of traits which are 
continuous, and hence, in spite of the existence of humanly unassignable 
gaps, these measurements may be regarded as continuous data. 

Ordinarily, measurements of continuous variables are reported to the 
nearest value of some convenient unit. Weights, for example, are usually 
read to the nearest pound, or ounce, or gram, or centigram, depending оп 
the degree of precision required. Thus, when one weighs himself and finds 
the pointer on the scale is closer to 146 than to 145, he reads his weight, as 
146 pounds. When a person gives his weight as 181 pounds, we interpret 
this to mean that his real weight is nearer 181 than 180 or 182 pounds 
that is, that it is actually somewhere between 180.5 and 181.5. Similarly, 
heights are measured to the nearest inch, or sometimes to the nearest half 
or quarter of an inch, and performance in the hundred-yard dash is timed 
to the nearest fifth or tenth of a second, These values in terms of which 
measurements are read or reported are known as units of measurement. 
Thus the unit employed in measuring lengths may be one-sixteenth of an 
inch. The fact that one-sixteenth of an inch is a fractional part of another 
familiar unit does not alter the fact that one-sixteenth of an inch may itself 
be used as a unit of measurement—after all, this other familiar unit ix itsell 
a fractional part (one twelfth) of still another unit. 

In a frequency distribution of weight 
identified by the integral limits 160-164 (th 
whole numbers 160 and 164) must be cons 
159.5 to 164.5 pounds, since 160 represents any real, or true, or actual 
weight from 159.5 to 160.5 pounds and 164 any real weight from 163.5 to 
164.5. Hence, whenever measurements are taken to the nearest value of the 
unit involved, the real Limits of a class or interval in a frequency distribution 
should be considered as extending one-half of a unit on either side of the 
integral limits. The so-called integral limits are actually not limits at all, 
but only the highest and lowest unit points within the interval. In fact, 
the measurements may be reported in such a way that these integral limits 
are not even expressed as integers or whole numbers. Suppose, for example, 


s in pounds, then, an interval 
at is, the limits expressed as the 
idered as really extending from 


20 


THE FREQUENCY DISTRIBUTION 


that measurements of length are taken to the nearest one-fourth of an inch 
and that an interval or class in a frequency distribution is identified as 
extending from 59 1,4 to 60 3/4 inches. These values are, of course, 
integral limits. The real limits, which extend one-half of a unit (i.e., one- 
half of one-fourth) on either side of these values, are 59 1/8 and 607/8 
inches respectively . 

It is important to note that occasionally measurements of continuous 
variables are reported to the last instead of nearest value of the unit in- 
volved. In the collecting of chronological age data, for example, it is the 
usual practice to express an individual's age in terms of the number of 
years on his last birthday. Thus the actual age of a boy whose reported 
age is 13 years may be anywhere from 13 up to, but not including, 14 years, 
ER, from 13 to 13.99 years. Similarly, "five years of teaching experience" 
could, as such data are often recorded, correspond to an actual period of 
experience anywhere from 5 to 5.99 years in length. Yor measurements of 
this type, we tise: in order to avoid systematic errors, consider an integral 
measure as the lower real limit of a unit interval. 1 he real limits of an inter- 
val or class in a grouped frequency distribution involving such data would 
have to be considered as extending from the lowest unit point in the inter- 
val up to, but not including, the lowest point in the next higher interval. 
Thus, the real limits of the interval 16-17 would in this case be 16-17.99.] 
From the foregoing discussion it should be elean thag how an : erval in a 
grouped frequency distribution should be олараг depends upon the 
manner in which the data were collected, or in which the measurements 


We gi there is little room for doubt about the manner in which the 
data are Бева: A possible important exception, however, is to be found 
in the scores yielded by all kinds of psychological and educat igal score ed 
tions. Sm writers On statistical procedures have maintained. that for 
examinations of this type an integral test score Rina е AT ны 
representing an interval which extends from the ош озна up 
to, but not including, the next integer above, i іа, as E = 0 они. 
Ment to the last unit. They would киы vor a : es : Pm fe 
17 on the verbal-meaning section of ШШ кш m сй» mds 
be interpreted as representing a unit ing л ‘ re - std ie 8 
that while the individual involved must -€— tially се м А i 
least 17 verbal-meaning tasks, he may coneeiva кы зык um a " муз. - 
“n 18th, so that his actual score may be anywhere within the interval 


whole or integral number of whatever 


Ы ; val limits always represent : D p ET 
p aem the р ipi p ПЕ ер анаи ire are 237 and 
may be involved. x) 2 es re 22 E 243.5 quarter inches, 
243 Ey ш yan chile the real limits are 23 uid 2 al ri | | 
it pacer digas у ^ { there is a trend away from recording measurements in this 
should be noted that there 18 ¢ 


Š be reported to the nearest rather than 
: а г call for age to i 
Manner. Many stionnaires now са 

- Many questi 


as of the last birthday. 


21 


THE FREQUENCY DISTRIBUTION 


renee N CN ооу 


DE ard 


specified. This view, however, is completely inconsistent with the known 
fact that errors of measurement due to test unreliability are equally likely 
to occur in either direction. Suppose, for example, that the tasks comprising 
a psychological or educational test are considered as a sample selected to 
represent a larger body or population of such tasks und that it is the func- 
tion of the test in question to rank individuals according to the proportion 
of tasks in the population with which they can successfully cope. Suppose 
further that two individuals, A and B, actually are able to cope with equal 
proportions of the totality of tasks, but that these proportions are based on 
different tasks; that is, A can cope with some tasks that B cannot, and B 
can cope with some that A cannot. Now actually, A and B should be tied 
in rank and yet, purely because of an accident of chance, the particular 
tasks selected for the test may contain a large proportion of those which А 
can solve but B cannot, and a small proportion of those which B can solve 
but A cannot, with the result that A is ranked higher than he actually 
should be, and B lower. "This serves to show how errors of measurement m 
tests of this character can occur in either direction. For this reason, it is 
suggested that integral scores on all psychological and educational tests and 
scales be considered as midpoints of unit intervals, and that the real limits 
of any interval in a grouped frequency distribution of such scores be con- 
sidered as extending one-half of a unit on either side of the integral limits. 

We have noted previously that when we present a collection of measures 
in a grouped frequency distribution, we sacrifice information regarding 
their individual values. It is often desirable, neverthel to be able to 
offer some indication of the values of the scores in an interval. Perhaps the 
simplest and most common practice is that of using the interval midpoint 
as an index of the values of the scores classified in it. 
fact, led to the use of the term index value 
midpoint or index value of any interval is 
the real limits, regardless of the manne 
been taken. The midpoint of the interval 16-17 would, in the case of meas- 


urement to the nearest unit, be half-way between 15 


5.5 and 17.5, or 16.5. 
On the other hand, if the measurements had been recorded as of the last unit, 


the midpoint of this interval would be half-way between 16 and 17.99, or 17. 

It would appear, because of the discontinuous character of discrete 
data, that the preceding suggestions for determining real limits and mid- 
points may not be applied when the data are discrete. Some writers, in 
fact, have given special consideration to the construction and interpreta- 
tion of frequency dist ributions of discrete data, and have described modi- 
fied procedures Tor their treatment. These modifications, however, are 
rarely, if ever, of any great Practical consequence, In this book, therefore, 
no distinction will be made in the statistical treatment of continuous and 


discrete data, either with reference to the frequency distribution or to 
techniques later considered, 


This practice has, in 
to mean interval midpoint. The 
always the point midway between 
rin which the measurements have 


22 


THE FREQUEN( Y DISTRIBUTION 


cy DISTRIBUTIONS 


"ATION OF FREQUE 


2.5. (GRAPHICAL REPRESE 

Before turning to specifie problems which arise in connection with the 
choice of classes or intervals, it will be helpful to consider techniques com- 
monly employed in representing frequency distributions graphically—that 
is, schemes for the pictorial or diagrammatic presentation of the same 
information regarding the distribution of scores that has heretofore been 
presented in tabular form. 

Two such schemes will be presented. The first of these is the histogram. 
In this scheme the scale of values of a variable is marked off along a straight 
line and rectangles are then constructed above the intervals or classes with 
ssociated with the classes. 


areas equal or proportional to the frequencies 
This type of representation is illustrated in Figure 2.1. This histogram in 
Figure 2.1 is based on the frequency distribution given beside it. 


Frequency Distribution 
Weights to 


15 

© Nearest Pound 

a (x) f 

Э ioE 95-99 2 

2 - 90-94 6 

E 85-89 5 

z 80-84 12 

10 75-79 5 ك 

o 

9 70-74 16 

kd 65-69 13 
60-64 6 


60 65 70 75 80 85 90 95 100 
X-Scale (Weight in Pounds) 


Figvng 2.1 Frequency distribution and histogram of weights of 70 boys 


al lines at the left and at the bottom of the 
In Figure 2.1 the scale along the vertical axis 
encies in the individual intervals or classes are 
as the frequency scale. The horizontal scale 
is likewise divided into a number of equal units, each of which corresponds 
to a unit of whatever scale has been employed to measure the attribute 
involved——in this case, а pound of weight. This scale is referred to as the 

The decision to use the vertical and horizontal 
ales is an arbitrary one. While the choice made 
in Figure 2.1 corresponds to common practice, TE pe gros when, for 
sake of convenience, it may be desirable to roe the | оа Arale on the 
Vertical axis and the frequency senle an the horizontal axis. Pigura 22 


shows the histogram of Figure 2.1 with the scales thus reversed. 


The vertical and horizont 
figure are known as the ares. 
is that along which the frequ 
represented. It is referred to 


attribute or score scale. 
axes for these respective хе 


23 


THE FREQUENCY DISTRIBUTION 


зе base (or side 1 gure 2.2) Ol ? corresponds [0 a 
l ( sid 1 2.2) of cach i angle rrespon 1. ; 

interva яа g the s ore scale and extends fro ed yper rea 

tery g 2 SC lex s from the lower to th 11 
< 1 erva 115 ui € (or еп 1 g desi 
1 T n Is exam le the heigh g 

limit of tl V In t ht t ; 2 
of each rectangle has been made ¢ 1 ial to the fre quency of the rrespon: g 
ach re g sł 1 4 


i ible i is ex: * because of 
interval in the distribution. This was possible in this example becaus 
interv: : 


100 


95 


о 
o 


со 
Cn 


со 
о 
| 


N 
о 


FIGURE 2. 
weights of 


Frequency histogram of 


X-Scale (Weight in Pounds) 


2 
/ 


0 boys 


N 
o 


о 
an 


П 


60 


0 3 10 15 
f-Scale (Number of Boys) 


; A nan aio "henever the 
the fact that the intervals involved were of uniform size. Whenever 
intervals of a distribution are of the same s 
of the histogram will also be of the same s 


бе, the bases of the rectangles 


. Since areas of rectangles 
having equal bases are proportional to their heights, it follows that the 
areas of the rectangles will, in such cases, also be proportional to the fre- 
quencies. Making the areas of the rectangles proportional to the frequencies 
by making their heights equal to the fr 
tages. First, it makes for ease 
portant, it simplifies using the 
any interval by simply followir 
rectangle to the frequency se 
the same as the frequeney 

The manner in which 
equal intervals is const ructed is too oby 
explanation. The J-seale is laid off 
frequency in the distribution. 


‘quencies has two obvious advan- 
of construction. But second, and more im- 
histogram to read the frequeney count for 
1g the guide line provided by the top of the 
ale. The seale value thus arrived at will be 
of the interval involved. 
a histogram representing a distribution involving 
ious to Warrant any very detailed 
70 as to provide for the largest. class 


Unlike the Хе 


ale, this scale must always 


24 


THE FREQUENCY DISTRIBUTION 


begin with zero, for to start this scale with a value greater than zero would 
not only have the effect of cutting off a part of the picture, but would make 
it impossible to compare the magnitudes of the frequencies in the different 
intervals by noting approximately how many times higher (or longer) one 
rectangle is than another. While it is common practice to mark off the 
X-seale in such a way as to accommodate an extra empty interval or two 
at each end of the distribution, there ix no necessity for extending this scale 
to zero. To do so would often result in the presentation of a long portion of 
the seale where no measures or scores fall. It should be noted that the use 


of squared paper (graph paper) will usually make it easier to lay off these 


scales as well as to draw the rectangles. 
ticular frequency distribution involves intervals of 
histogram cannot be made with heights 


Of course, if a par 
varying sizes, the rectangles of the › i : 
equal to frequencies if the required relationship between area and frequency 
is to hold, i.e, if the areas of the rectangles are to be equal or proportional 
to the corresponding frequencies. Such distributions аге not common, but 
йге occasionally used in situations in which a finer degree of discrimination 
is desired in cart portions of the score seale than in others. This may be 
scores are concentrated in a relatively 


the case when a large number of col ; 

small portion of the score scale and the remaining scores are widely scat- 

tered throughout the rest of the seale. An example illustrating the con- 

struction of a histogram for a distribution of this type is presented in a 

a histogr 

later section of this chapter. S " . 

The second type of graphical representation 1 the frequeney polygon. 
` seco B BH ч eue * 

The fy p EDIN in Figure 2.3 is based on the same distribution as 

‘Trequency polyg 5 


the histogram of Figure 2.1. 


The frequency polygon mi wiped ae! ; 
the ырл д I E straight lines joining the midpoints of the upper 
d [ES А, “ a 


iy be considered as having been derived from 


a 


© 


f-Scale (Number of Boys) 
a 


55 60 65 70 75 80 85 90 10ê dus 
X-Scale (Weight in Pounds) 


Е 2.3 Frequency polygon of distribution of weights of 70 boys 
"IGURE 2. quency ро! 


25 


THE FREQUENCY DISTRIBUTION 


bases (tops) of adjacent rectangles. It may, if the intervals are equal, be 
constructed without reference to the histogram by locating directly above 
the midpoint of each interval along the X-seale a dot at a height equal to 
the frequency of the interval and by then joining the successive dots with 
straight lines. The figure is closed or brought to the base line (Y-seale) by 
extending the Х-хеае to include the empty intervals at each extreme of the 
distribution, and by including in the system of dots to be joined the mid- 
points of these two intervals. This amounts, in the case of these two inter- 
vals, to making a dot at zero height above their midpoints to correspond to 
their zero frequency values. 


Histograms and polygons are usually constructed for the simple purpose 
of displaying in the most readily interpretable manner an over-all picture 
of the general way in which the scores are distributed along the score scale. 
They reveal, at a glance, what we shall refer to as the form of the distribu- 


a [шы 
mom 4] 


FIGURE 2 ist 1 i 
URE 2.4. Histograms showing various forms of frequency distributions 


26 


THE FREQUENCY DISTRIBUTION 


tion. There are a variety of ways or forms in which scores or measures may 
be distributed along a scale, and it will greatly facilitate later discussion if 
some of the more common types can be identified by name. First, it should 
be noted that distributions may be classified as either symmetrical or 
skewed, i.e., asymmetrical. A distribution is said to be symmetrical if the 
figure representing it (polygon or histogram) can be folded along a vertical 
line so that the two halves of the figure coincide. Histograms C, D, E, and 
H of Figure 2.4 are illustrations of symmetrical distributions. If relatively 
minor fluctuations are disregarded, Distribution G may also be classified as 
symmetrical. 

If the measures are not thus symmetrically distributed that is, if they 
tend to be thinly strung out at one end of the score scale and piled up at the 
other—the distribution is said to be skewed. Distributions A, B, and F of 
Figure 2.4 are examples of skewed distributions. Two types of skewness 
are possible. If the scores are thinly strung out toward the right or upper 
end of the score scale and piled up at the lower end, the distribution is said 
to be skewed to the right or pesitively skewed. When the situation is reversed, 
the distribution is skewed to the left or negatively skewed. Distributions B 
and F of Figure 2.4 are positively skewed and Distribution A is negatively 
skewed. Note that the direction or type of skewness is determined by the 
side on which the scores are stretched out rather than by the side on which 
they are concentrated. 

When the scores of a distribution are clearly more heavily concentrated 
in one interval than in any other, the distribution is said to be unimodal. 
In Figure 2.4, Distributions A, B, C, D, E, and F are all unimodal. A, B, 
and F are unimodal and skewed, while C, D, and E are unimodal and 
symmetrical. Unimodal symmetrical distributions are often referred to as 
bell-shaped distributions because when represented by polygons they have 
somewhat the appearance of a cross-section of a bell. Histograms С, D, 
and E in Figure 2.4 are bell-shaped but exhibit various degrees of flatness 
or peakedness. 

When the scores are concentrated at one or the other extreme end of 
the distribution, as in F of Figure 2.4, the distribution is said to be J- 
shaped. F illustrates a positively skewed J-shaped distribution. J-shaped 
distributions may, of course, be either positively or negatively skewed. 

A frequency dist ribution is said to be rectangular to the degree that all 
class frequencies tend to have the same value. Histogram G of Figure 2.4 
is an example of a distribution approaching rectangularity. 

Distributions in which the scores are heavily concentrated in two 
distinct parts of the scale, or in two separated intervals, are said to be 
bimodal. H of Figure 2.4 is an example of a type of bimodal symmetrical 
distribution often referred to as a U-shaped distribution. Distributions 
characterized by more than two pronounced concentrations of scores are 
said to be multimodal. Distributions are bimodal or multimodal even 


27 


THE FREQUENCY DISTRIBUTION 


though the concentrations are not equal. Figure 2.5 is an illustration of a 
bimodal distribution in which the concentration is greater at one part of 
the scale than at the other. 


Figure 2.5 Histogram of a 
bimodal frequency distribution 


2.4 SELECTING THE CLASSES: GENERALIZING ABOUT THE 
Form or A DISTRIBUTION 


Generalization in statistics usually refers to the act of drawing infer- 
ences about some parent collection of data from a limited collection or 
sample presumably representative of the parent collection. Most research 
studies in psychology and education as well as in other fields involve gen- 
eralizations of this type. That is, measurements or observations are made 
of a limited collection or sample of individuals or objeets in order that 
generalizations may be established about the still larger collections ОГ 
populations that these samples are supposed to represent. Because the 
individuals or objects comprising a population differ from one another, and 
because chance or uncontrolled influences always play some part in de- 
termining which of these differing individuals are to constitute the sample 
used, the characteristics of the sample are almost certain to differ to some 
extent from those of the population itself. Consideration of what may be 
reasonably expected by way of such differences or sampling errors in spe- 
cific situations comprises a major portion of later chapters in this book. 
At this point we are concerned only with a very crude technique which, if 
used with caution, may serve to minimize a certain type of discrepancy 
between sample and population. 

| Suppose that we are interested in the manner in which the scores com- 
prising a large collection, i.e., population, are distributed along the scale 
involved, but that for some reason it is highly impracticable—if not im- 
possible— for us to study all of the scores in the entire population. Any 
uere Tash tan e t fh eof ua 
sample таву not be Te of the _ aber eee what is teua of н 
"The partieular vendistión йы уы к to which we wish to generalize- 
ara stic in which we : 


e are interested here 18, 
of course, the form of the distribution. We shall assume that we are inter- 


ested in the form of the distribution only in a very general way. That is, 


28 


THE FREQUENCY DISTRIBUTION 


we wish to know simply whether the distribution is bell-shaped, or posi- 
tively skewed, or negatively skewed, or bimodal, or rectangular, and so on. 

Now when a relatively few scores are classified into a large number of 
possible classes, general tendencies are much less likely to be discernible 
than when these scores are grouped into a small number of possible classes. 
Suppose, for example, that we regard the scores reported in Table 2.1 as a 
sample from a large population of such scores. When the number of pos- 
; in the distribution of Table 2.3, or for that 
it is almost impossible to discern, even with 


sible classes is as great : 
matter, Tables 2.4 or 2.5, 
100 scores, general population characteristics of the type with which we 
are here concerned. On the other hand, when the number of classes is 
greatly reduced as in Distribution C of Table 2.5, we clearly gain the im- 
pression of a possible bimodal population distribution. To illustrate 
further the effect of changing class size upon the appearance of a distribu- 
tion, histograms of Distributions A and C, Table 2.5, are shown in Figures 


2.6 and 2.7. 


f-Scale 
——юсо коом ооо 


0 120 130 140 150 160 170 180 190 200 


50 60 70 80 90 10011 
X-Scale 


Figure 2.6 Histogram of Distribution A of Table 2.5 


f-Scale 


| i 
10 120 130 140 150160 170 180 190 200 
X-Scale 


T T 
40 50 60 70 80 90 1001 


‘stribution C of Table 2.5 
` 2.7 istogram of Distribu 
Figure 2.7 1181007 
1 goest in the form of a general rule the optimum 
о sugges . ise n p А 
а be employed when the resulting distribu- 
king inferences about the general form of 


It is impossible 
Number of classes which shoul 


tion is to be used as a basis for ma 


29 


THE FREQUENCY DISTRIBUTION 


istributi Й з r scores in the 
a population or parent distribution. When the number of scores in 
sample is necessarily 
few in number, say, 


small,* it is essential that the intervals be coarse and 
5 to 10. On the other hand, if the number of scores in 
the sample is large, a somewhat greater number of classes may be EE 
It is important to note that the greater the number of ЧА ү z 
sample, the less likely are serious discrepancies between (шташ p. ү 
population. Obviously, therefore, the best insurance against via ia m 
to the population some purely chance characteristic of the denn is 
sample, is the use of a large sample. It is only when circumstances prec и ч 
the use of large samples that one should resort to the use of a sample а 
quency distribution involving coarse intervals to obtain a clue to the 
general form of the population distribution. Me 
By way of caution it should be observed that the danger exists 0 
making the intervals so coarse as to obseure some important "d 
characteristic. Thus, in Distribution D of Table 2.5, which involves only 
50 
40 
30 


20 
10 


f-Scale 


50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 
X-Scale 


Frauke 2.8 Histogram of Distribution D of Table 2.5 


three classes, the bimodal feature of the dat 
as can be seen in Figure 2.8, which presents 
distribution. 

When the data involved in a sample frequency distribution are meas- 
urements of a continuous attribute, and when the 
they come is itself extremely large and composed of individuals represent- 
ing all shades of variation in the amount of the attribute possessed, then 
it is logical to assume that many of the irregularities of the sample are 
actually sampling errors or chance irregularities not truly characteristic of 
the entire population, This follows from the notion that if “true’’—or at 
least extremely accurate—measurements of the attribute involved could 
be obtained for all members of the population, the polygon of the resulting 
frequency distribution would approach a smooth curve. In order to obtain 2 
more highly generalized Picture, therefore, the practice of "smoothing" the 
sample figure is sometimes followed. One simple means of accomplishing 


this consists of drawing "free-hand" a smooth curved line which comes as 
A ا‎ 

*The student may well ask, 
However, it would usually be 
the form of a popul: 


a has been completely obscured, 
; or - 
the histogram of this frequency 


population from which 


"What is small?” A сабе 
u be foolhardy indeed to base 
ation distribution on fewer than 10 


gorical answer is impossible. 
even crude generalizations abou 
0 observations, 


30 


THE FREQUENCY DISTRIBUTION 


close as possible to passing through all of the points used in plotting the 
polygon, or, in other words, which most nearly fits or coincides with the 
outline of the polygon. Such a "generalized" curve ix presented in Figure 
2.9 for the frequency distribution accompanying Figure 2.1. For purposes 
of comparison, the straight-line polygon has been superimposed on the 
generalized curve in Figure 2.9. 


15 


T 
55 60 65 70 75 80 85 90 95 100 105 


X-Scale 
Fiavnk 2.9 Generalized frequency curve of weights of 70 boys 


It should be clearly understood that such smoothing is proper only 


When the group of individuals involved is not being st udied for its own sake, 
but is rather being considered as a sample which is presumably representa- 
tive of some still larger group or population. The purpose of smoothing, 
then, would be to remove from the polygon for the sample those irregulari- 
ties which would not be characteristic of the distribution for the entire 
Population. The principal danger in this smoothing procedure is that it 
sometimes removes irregularities which are not accidental, but which are 
real and perhaps significant characteristics of the distribution for the whole 
¢ € a bn - 5 B . 

Population There is of course, no way of telling by inspection whether or 
not a given irregularity is accidental. RES 

1 e LE Eu more objective ways of smoothing figures than 

rere are ¢ i 4 4 mose ol desai 

the free-hand method just described. For the simple Scien 2 describing 
the form of the population distribution they are not suf iciently etter than 
the free-hand method to warrant consideration here. The only highly de- 
Pendable * ethod of eliminating accidental irregularities is to collect data 
ends ne sex. that is, to plot the results for larger samples. 
from larger numbers of cases, that 15, = agi 
Йе | iti lisappear as the size of the sample is increased, we 

‘ertain irregularities disa abis ; Жетер if "ist. w 
May be qui tain that they were accidental, whe E if they Perast “WE 
ace pe quite certa ıt they are truly characteristic of the popula- 
Nave increasing assurance that they û È 
tion istri i 2 А 

ан portance to note in this connection that the polygon 
1s also of 1m : ^ 


31 


THE FREQUENCY DISTRIBUTION 


provides a more realistic picture of population distributions, even without 
smoothing, than does the histogram. The latter, with its flat-topped 
rectangles, implies an even distribution of the frequencies within a class 
with an abrupt change occurring at the class boundary point. The former, 
with its sloping lines, implies a gradual change in the magnitudes of the 
frequencies which is characteristic of the type of populations we have 
been considering. 


SELECTING THE CLASSES: COMPUTATION 


Occasionally it is convenient to determine or compute certain statistical 
indexes from data which have been organized into a frequency dist ribution. 
In such instances there are certain principles which should be observed in 
selecting the classes. . 

Since information regarding the individual values of the measures is 
lost when the measures are organized into a grouped frequeney distribution, 
and since computations obviously cannot be effected without some knowl- 
edge of the magnitudes of the scores involved, it becomes necessary to assign 
some value to the scores classified in a given interval. The value commonly 
used is the interval midpoint or index value. Vt is clear that the accuracy of 
any computations based on the use of interval midpoints depends upon 
how well these values represent the original magnitudes of the classified 
measures. The difference between the value of a statistical index as com- 
puted from a grouped frequency distribution and the corresponding value 
as computed from the original unclassified measures is known as grouping 
error, 

Experience has shown that grouping errors are usu 
to be serious if no fewer than 15 classes are used. Actu 
this, say 20, would ordinarily be preferable. For 
computations are to be based on data orge 
tion, it is advisable to select an interval of 
mately 20 classes 


ally not large enough 
ally, a few more than 
this reason when certain 
inized into a frequency distribu- 
a size that will result in approxi- 
To determine the size of an interval that will produce 
approximately 20 classes it is necess ry, of course, only to divide the range 
(that is, the difference between the largest and smallest. scores) by 20. 
Some convenient integral amount. ne: 
used as the size of the intery 
whole number in t 
Thus, if measure 


ır this quotient in value may then be 
al. Integral amount, as here used, means à 
erms of the particular unit of measurement employed. 
ments of length to the nearest. inch are involved, the inter- 
val should span some suitable 


whole number of inches. If, however, the 
measurements are reported to the ne: 


: arest quarter inch, the interval should 
span some suitable whole number of quarter inches 
There remains, however, 


| | i one additional f 
sidered in selecting the size of an interv 


used in computational work. Since th 


actor whieh should be con- 
al fora frequency distribution to be 
€ scores classified in an interval are 


32 


THE FREQUENCY DISTRIBUTION 


to be regarded as having the value of the interval midpoint, the computa- 
tions will be greatly simplified if these midpoints are themselves integers. 
When measurements are recorded to the nearest unit, the lower real limits 
of the intervals are always one-half unit below the lower integral limit. 
Consequently, when such data are involved, the midpoints of intervals 
spanning an odd number of units will be whole numbers. Hence, in select- 
ing the interval size, preference should be given to the use of intervals con- 
taining an odd number of units. Of course, this is not always possible, For 
example, if the range of a collection of scores is, say, 36, the quotient of 36 
divided by 20 is 1.8. The nearest integral value to this quotient is the even 
number 2. Here, if we employ the odd-sized interval 3, the resulting dis- 
tribution will contain fewer than the advisable minimum number of 15 
intervals, "Thus we are forced to use an interval of size 2 in spite of the in- 
convenience resulting from dealing with fractional midpoints. Multiples 
of 10 are also often used as interval sizes since the convenience arising from 
the use of the base of our number system is sufficient to offset the incon- 
venience of a fractional midpoint. | | | : 

It is important to note that the odd-sized interval has an integral mid- 
point. only when the measurements are taken to the nearest, unit. If the 
s are taken to the last unit, the lower real limits are them- 
and consequently, in this case, the midpoints of even-sized 
It should also be observed that there is no special 
asses, the important consideration being 


Measurements 
Selves integers 
Intervals are integers. 
Merit in employing precisely 20 el ra geile ога] ` 
that this number should not drop below 15. This allows considerable leeway 
in choosing the interval size. As a general rule, betes et, tis coarser the 
interval, the greater the magnitude of the grouping error and hence, in 
Cases of doubt, the use of too many intervals is preferable to the use of too 
few, А 
Finally, there remains the question of how the inter rals Papuk be 
Placed along the scale. Suppose the largest score m a сойи i in and 
been determined as appropriate. T here are 

then three wags in whith the intervals may be pen igi the siie: 
(1) the placement determined by using а top pra т an 
limits 63-65, (2) that determined by a top din hints iim А Н he 
determined by a top interval of 61-63. Ordinarily, the magnitude of the 


bout the same for one placement as for another. 
abot s Р 
à is usually as valid as any other. 


that an interval of size 3 has 


Brouping error will be 


Te) : r 7 
100, the choice of any es eee a та 
Nevertheless there is some merit in uniformity of practice. With this in 


Mind, most writers recommend positioning the кее „жое: биа 

ples of the interva size. nis practice 
f simplifying the clerical work involved. 
above situation would lead to the 
ed, since 63, the lower integral 


one possibility 


the lower integral limits are multi 
Pun the possible added advantage О 

'C application of this convention mi the кя 
Use Of the first of the three possibilities б 


та ч arval size. 
Mit, isa multiple of 3, the interval siz 


33 


THE FREQUENCY DISTRIBUTION 


In this section we have considered certain principles applicable to 
establishing the intervals for a frequency distribution to be used in the 
computation of certain statistical indexes. These principles may be sum- 
marized as follows: (1) Use at least 15 classes. Actually, 20 classes would 
be more neatly an optimum number. (2) I at all possible, the intervals 

should be chosen so that the midpoints will be integers. (3) Place the 


intervals so that the lower integral limits will be multiples of the interval 
size. 


2.6 SELECTING THE CLASSES: SCORES CONCENTRATED AT 
EQUALLY SPACED POINTS 


Occasionally the measures or scores in a collection tend to concentrate 
at equally spaced points along the scale. This tendency is usually due to 
the way in which the measurements are taken. For example, in rating 
themes on a percentage basis many judges tend to assign values which are 
multiples of five. Another example is to be found in the actual collection 
of scores given in Table 2.8. This table shows a frequency distribution in 
unit intervals of the number of semester hours of course work in the natural 
sciences completed by a sample of 100 juniors enrolled in a certain uni- 
versity.* The individual measures are seen to be clustered around multiples 
of four semester hours, which is what would be expected inasmuch as the 
typical science course at this educational level in this institution is a four 
semester-hour course. It is also important to note that the measures are 
expressed in terms of the number of semester hours completed. This being 


the case, it would appear best to view them as measurements reported to 
the last unit. 


Now suppose that it is required to arrange these data into a grouped 
frequency distribution which is to be used in computing certain statistical 
indexes. Application of the principles discussed in the foregoing section 
leads to the selection of an interval of size 2, with the uppermost interval 


extending from 36 to 37.99. Beginning with this interval, the midpoints 
take the values 37, 35 


3, 31,۰, ТА These intervals have been designated 
by the braces placed along the left-hand scale in Table 2.8. A cursory 
examination of the manner in which the individual measures are dist ributed 
within these classes is sufficient to show how poorly their midpoints repre- 
sent the aetual values of the scores falling in them. In the interval having 
the midpoint 13, for example, there are 31 twelves and only 2 thirteens. 
This tendency of the scores to be concentrated at one end of the interval is 
particularly pronounced in the cases of the intervals having midpoints 1, 


“The tally marks whieh ure used as a recording device in el 


i fying the measures 
are not ordinarily presented as a part of the frequency distribution. “They have been 


left in Table 2.8 because they provide a crude graphie pi i i 
е aus 1 aphie picture of the situs Я 
tRead the three dots "and so on to." | = 


34 


THE FREQUENCY DISTRIBUTION 


TABLE 2.8 Frequency Distribution Showing Number of Se- 
mester Hours of Natural Science Completed 


TALLIES BY я 
Units Unit f | Crass f 


0 
0 
1 
0 
0 
0 
1 
0 
0 
0 
1 
0 
1 
0 
3 
0 
2 
0 
4 
0 


Ui 
He Hh IR HT HH HHI 


I 


он © с © юе л © фо к NW со 


- 
E 
olo 


35 


THE FREQUENCY DISTRI BUTION 


5, 9, 13, 17, 21, and 25. Certainly the results of computations based on 
9; v, +9; = 3 ө 
these index values will be systematically in error. | 
While it is possible to eliminate this systematic error by incorporating 
a correction or adjustment in the computational procedure, it is ordinarily 
preferable to set up the distribution in such a way as to avoid this error in 
the first place, even if doing so implies departure from the principles of the 
foregoing section. It is obvious that if the class midpoints are to be as 
indicative as possible of the magnitudes of the classified scores, they must 
coincide with the concentration points. This dictates the use of an inter al 
of size 4—that is, the distance between concentration points —in spite of 
the fact that fewer than 15 intervals will result. Moreover. the intervals 
must be placed so that their midpoints are multiples of 4 if the midpoints 
are to coincide with the concentration points. The intervals established 
according to this scheme have been designated by braces marked off along 
the scale appearing on the right in Table 2.8. 
5 8 
that the resulting class midpoints 


It is clear from inspection 
are now more nearly representative of 
the classified scores. Computations based on this scheme will be more 
accurate, i.e., involve a smaller grouping error, in spite of the fact that fewer 
than the recommended number of classes have been employed. 

Occasionally in distributions of this type the range of scores may be so 
great, that the use of an interval equal to twice the distance between con- 
centration points can be justified. In such a situation, of course, the inter- 
vals should be positioned so that the two points of concentration falling 
within them are balanced about the midpoints. 


2.7 SELECTING THE CLASSES: MARKEDLY SKEWED DISTRIBUTIONS 


Occasionally it is nece 
involving extreme skewne 
of income for a particular g 
following facts hold: 


ssary to set up a frequency distribution of data 
Consider, for example, a collection of measures 
roup of 1,000 individuals, and for whieh the 


Largest income: $99,950 
Smallest income: 0 
50% of incomes below: 1,250 
90% of incomes below: 3,000 


Here half of the cases are concentrated between 0 and 1,250; 40 per cent 
fall between 1,250 and 3,000; and the remaining 10 per cent are spread out 
between 3,000 and 99,950. If a frequency dist ribution involving these data 
is to provide any discrimination at all among the families in the half having 
the lowest incomes, a rather fine interval of say 200, or perhaps 250, is 
needed. But if intervals of this size are used throughout, the distribution 
will contain from four to five hundred Classes, an obviously 
On the other hand, if some practicable number of equi 


surd number. 
al-sized classes is 


36 


THE FREQUENCY DISTRIBUTION 


used, say 20, the bottom class will include all families having incomes below 
$5,000. This means that more than 90 per cent of the families will be 
lumped into a single class. It is clear, therefore, that the only way in which 
some distinction can be made among the incomes of families in the lower 
income group, without at the same time using an absurd number of inter- 
vals, is to permit the size of the interval to vary. Just how this should be 
done depends both upon the nature of the data and the degree of diserimina- 
tion to be achieved at various parts of the scale. Fine intervals are needed 
along those portions of the scale in which the scores are most heavily con- 
centrated and at which the most precise discrimination is required. As the 
density of the scores decreases, fine discriminations become less important 
and the classes may be made increasingly larger. One way in which this 
might be done for the particular collection of data cited above is shown in 
Table 2.9. The right-hand column of Table 2.9 is not ordinarily presented 


ANNUAL INCOME Í Crass Size 
50,000-99,999 1 50,000 
TABLE 25,000-49,999 2 25,000 
2.9 20,000-24,999 2 5,000 
15,000-19,999 4 5,000 
Frequency Distribution of 10,000-14,999 5 5,000 
1,000 Individual Incomes 7,000- 9,999 6 3,000 
in Dollars for the Year 5,000- 6,999 8 2,000 
1946 ` 4,000- 4,999 14 1,000 
3,500- 3,999 17 1,000 
3,000- 3,499 41 500 
2,500- 2,999 85 500 
2,000- 2,499 116 500 
1,500- 1,999 124 500 
1,250- 1,499 75 250 
1,000- 1,249 78 250 
750- 999 99 250 
500- 749 104 250 
250- 499 107 250 


Q 249 112 250 
BENED 1,000 


distribution and has been included only to show 
asses have been varied in size. This table 
fine or narrow intervals being used over 
e the frequencies are greatest. It thus 
ding the distribution of income 


aS à part of the frequency 
Quickly and clearly how the е1 
Involves 19 classes with relatively 
that Portion of the income seale wher | 
Presents fairly detailed information regar 
“mong these 1,000 individuals. f this situation is presented in Table 


si ‘tribution illustrative 0 А 
2 m Tid oe Энине of service rendered by the 361 elementary 
U^ The data shown are J 


37 


THE FREQUENCY DISTRIBUTION 


and junior high school teachers in a city of approximately 85,000 popula- 
tion. The years of service were reported аз of the last full year completed. 
Hence, depending upon when the service first began and upon the time a 
particular report is made, it is possible for a teacher reporting, say 5 years, 
to have actually served anywhere from 5 to 5.99 years. This second distri- 
bution has been presented to direct attention to the distortion introduced 


TABLE 2.10 Frequency Distribution of Number of Years of 
Service of 361 Teachers of a City School System 
Friar aed She eis РОУ Hr. or RECTANGLE 
YEARS OF SERVICE f NIZE OF CLASS (с) isi ушйн UO 
35-44.99 10 10 1 
30-34.99 10 5 2 
25-29.99 15 5 3 
20-24.99 20 5 4 
15-19.99 35 5 7 
12-14.99 27 3 9 
10-11.99 20 2 10 
8- 9.99 22 2 11 
6- 7.99 24 2 12 
5- 5.99 13 1 13 
4— 4.99 15 1 15 
3- 3.99 19 1 19 
2- 2.99 24 1 24 
1- 1.99 35 1 35 
0- .99 72 1 72 
361 


as a result of varying the sizes of the cl 


esult of у asses. A careless glance at the fre- 
quencies in this table 


t s might, for example, lead to the erroneous conclusion 
: | € form of the distribution was bimodal. Such a conclusion results as 
ране т the frequency 35 in the upper part of the distribution 
pe e ios number of teachers reporting either 15, or 16, or 17, or 
аы. в mnie iae, while the frequency 35 toward the bottom of the 

lon represents the total number of teachers reporting only one 


ear of service, ^ illusi i i 
e. €. The illusion created in this table, which makes large 
distances along the score scale 


may be removed b 


P ЫН = class frequencies by the simple device of 
9 their frequencies, In a situation of this type, 


THE FREQUENCY DISTRIBUTION 


the simplest way of making the areas of the rectangles proportional to their 
respective class frequencies is to demand that these areas be equal to the 
corresponding frequencies. If this is done, the heights of the rectangles can 
be obtained by dividing their areas, that is, their frequencies, by the lengths 
of their respective bases. The appropriate heights of the rectangles repre- 
senting each class are shown in the last column of Table 2.10. The histo- 
gram itself is presented in Figure 2.10. As this figure clearly shows, the 


40-1 


f/c-Scale 


20 25 30 35 40 45 
X-Scale 


F 10 Histogram of frequency distribution of years of service of 
теске 2. 5 н 

361 teachers of а city school system 

-shaped distribution which is skewed to the 
of course, that the irregularities in the 
liminated by the use of coarse intervals 


e scale in which they were most pronounced. 


distribution ix a fairly smooth J 
tight. It should be recognized, 
Original ungrouped data have been е 
In the very portion of th 


T sus: SU RY REMARKS 

SELECTIN ; CLASSES: SUMMA 

2.8 SELECTING THE 

ling with the selection of suitable classes for 
D 


dez 


"ү —_— actions А 1 . á 
Phe foregoing sectio Id be sufficient to establish our previous con- 


frequency distributions shou 


39 


THE FREQUENCY DISTRIBUTION 


tention that no general rule concerning the sizes af i Sg rir 
can possibly be appropriate for all purposes or types or . 3 ie a B vin 
treated varied both in the purposes for which the distributions e € 
pared and in the types of data involved. It should be clearly un К - 
that these particular situations do not represent a cataloguing of a e 
sible situations. They should suffice, none the less, to show how ner 
it is to consider the specific purpose or purposes for which a 1 requeney c 3 
tribution is to be used, as well as the type of data involved. These example н 
should also show the importance of being constantly on the alert for any 
deviation from the ordinary. They should serve as adequate ее 
against the two major causes of statistical errors, carelessness and pm 
blanket application of “rule-of-thumb” procedures without regard to the 
peculiarities of the situation involved. 


2.9 CLASSIFYING THE MEASURES AND REPORTING THE FREQUENCI 


Once the elass 


es have been selected and listed, the completion of the 
frequency distribution is a relatively simple task. Beginning with the first 
measure or score in the original unordered list, it is only necessary to de- 
termine in which interval each score belongs, and to place for each a tally 
mark in the tally column opposite the appropriate interval. The subsequent 
counting will be facilitated if every fifth mark in a row is made slanting 
across the preceding four marks (see Table 2.8). 
marks opposite each interval should be 
the frequency column. 
work column and 
distribution. 


As a partial check on the accuracy of the tabul 
the frequency column should be added 
total number of scores in the 
may be reasonably сег been overlooked in 
the classification proce: an once. This cheek, however 
obviously offers no assurance against misclassifications. The only way to 
check against possible e 


Trors of this type is to repeat the classification! 
process and to compare the two sets of tally 


The beginning student. of Statisties may feel 
work in the first instance will serve to obvi 
repetition. The experienced statistician, 
that errors occur no matter how care! 
fully satisfactory methods of checkir 
independent repetition of a process 

Thus far the freque 
classes have been re 
however, to report 


Then the number of tally 
counted and the result recorded in 
It should be observed that the tally column is а 
is not included in the final report of the frequency 


ations, the numbers in 
and this sum compared. with the 
collection. If these numbers are the same, one 
tain that none of the scores has 
55 or counted more th 


marks and frequency counts 
that careful and райда КЁ 
ate the need for such a tedious 
however, has long since learned 
ful he tries to be; and, when no other 
1g exist, he automatically employs 2n 
l as part of his st 
nees with whic 
Ported only 
these 


andard procedure. | 
S fall into the various 
al counts. It is not unusta 
tions, or percentages, of the 


‘h the Score 
їп terms of actu 
counts as decimal f rac 


40 


THE FREQUENCY prsrRIRUTION 


total number of scores in the collection. These percentages have the ad- 
Vantage of relating the frequency count associated with a particular interval 
to the total number of scores in the distribution and are, hence, known ал 
relative frequencies. 

It is often convenient to arrange the scores in a collection in order of 
size or to organize them into a unit-interval frequency distribution. If the 
range is large, it may be impossible to list the intervals—i.e., the possible 
score values in a single column so that the classification work sheet may 
bulk awkwardly over several columns or even spread beyond а single sheet. 
The work sheet may be conveniently arranged into a more compact form by 
means of a double-entry format. The columns of such a double-entry 


TABLE 2.11 Double-Entry Classification Table (Data Taken 
` from Table 2.1) 


Units 


41 


THE FREQUENCY DISTRIBUTION 


classification table correspond to the units digits of the scores, and the rows 
of this table correspond to the number of tens.* A work table of this type 
as it would be applied to the data of Table 2.1 is presented in Table 2.11. 
In this table, for example, the five scores of 112 included in the collection of 
Table 2.1 have been tallied in the cell determined by the intersection of 
Row 11 (11 tens) and Column 2 (2 ones); this is the cell which corresponds 
to the number 112. 


2.10 GRAPHICAL COMPARISON OF THE VARIABILITY OF 
Two Frequency DISTRIBUTIONS 


Occasionally it is desired to compare the frequency distributions of 
measures of the same trait for two or more different groups of individuals 
or objects for the purpose of determining in which group the measures are 
the more variable or tend to differ more widely in magnitude. Later we 
shall study more precise methods of making such comparisons. Here we 
shall simply be concerned with certain problems associated with the rela- 
tively crude means of accomplishing this purpose provided by the com- 
parative inspection of the polygons or histograms representing the distribu- 
tions involved. The greater the over-all width or range of such a figure in 
relation to its height, the more it appears to suggest considerable variation 
in the magnitudes of the scores. Thus, the histogram of the distribution of 
ages of a group of boys shown in Figure 2.11, which is wider than it is high, 


10 


f-Scale 
[7] 


0 


140 145 150 155 160 165 
X-Scale 


FIGURE 2 ; "€ 
FIGURE 2.11 Histogram of frequency distribution of ages of 25 seventh- 
grade boys 6 

іх ү more suggestive of marked variation in the magnitudes of their 
aces р : Te: nH 

joe ius а ће histogram of Figure 2.12, which is higher than it ix wide. 
Ac Та; es er, there is no difference in the variability of the scores 
comprising the two frequency distributions thus pictured. These distribu- 
tions are presented in Tables 2.12 and 2.13 


That they are equally variable 


*In statistical tabulations, the horizontal 


tical arrays, columns. arrays of figures are called rows and the ver- 


42 


THE FREQUENCY DISTRIBUTION 


f-Scale 


140 145 150 
X-Scale 


Fiaurn 2.12 Histogram of frequency distribution of ages of 200 seventh- 


grade boys 


MONTHS f rf 

TABLE 2.12 162-164 2 08 
159-161 1 04 

ОРТ 156-158 4 16 
ney Distribution of 153-155 7 28 

the Ages to the Nearest 150-152 5 20 
Month of 25 Seventh- 147-149 2 0S 
Grade Boys 144-146 3 12 
141-143 1 04 

TOTAL 25 1.00 


THE p : 
ЧЕ FREQUENCY DISTRIBUTION 


TABLE 2,13 MONTHS f rf 
162-164 16 08 
Frequency Distribution of 159-161 M 04 
the Ages to the Nearest 150-158 32 16 
Month of 200 Seventh- 153-155 5 28 
Grade Boys 150-152 40 .20 
147-149 16 0% 
144-146 24 12 
141-143 bi (04 
TOTAL 200 1.00 


is clear from a comparison of the relative frequencies also presented in these 
tables. These show identical proportions of ages falling into the correspond- 
ing classes of each of the distributions. 

The obvious difficulty arises from the fact that one of the two graphs 
involved in the foregoing example is based on cight times as many indi- 


1.00 [ 
90 


-80 


rf-Scale 
i 
о 
—— 


0.00 
140 


X-Scale 


FIGURE 2.13 Histogram of 


Table 2.12 or 2.13 relative frequency distribution of either 


44 


THE FREQUENCY DISTRIBUTION 


viduals as the other. If the distributions are equally variable, the graph of 
the one based on the greater number of measures is bound to appear to be 
higher in relation to its width than is the graph of the other. Consequently, 
unless the user of such diagrams is alert to possible differences in the s 
of the groups, he may be misled in the conclusions he draws in comparing 
them with respect to variability. Fortunately, this difficulty is easy to 
remedy. All that is required is that the polygons or histograms to be used 
in such comparisons be based on relative frequencies. Such diagrams would 

2 апа 3.12 


be identieal for such distributions as are presented in Tables 
The relative frequency distribution histogram for these tables is shown in 


Figure 2.13. 


.28 


Micurn 2.14 Histogram of relative 
frequency distribution of Table 2.12 
(with large-unit distance on rf-scale 


and small-unit distance on X-scale) 


.20 


rf-Scale 


12 


Ттт т 
140 145 150 155 160 165 
X-Scale 


0.00 


4 ad is not the only factor other than the 

But the si > group involved is n 1% А , 
ло size of the groug de cb AEST ES HE : i : 
Variability of tl oras that determines the relative our ы of 
В 7 ye scores B гел] factor is э (+ 0 ы 
Polygons and histograms. An even more critic al i tor is the c im n the 
e NS x ME relative frequency units. е 
Physical dis > resenting score and relative frequency units. These 
al distances repres З ful manipulation capable 


are ally subject to purpose 


arbitrary and hence actu By using, for example, a large physical 


f Producing misleading results. 


45 


T 
HE FREQUENCY DISTRIBUTION 


distance to represent a unit of relative frequency in conjunction with a 
small distance to represent a score unit, the distribution may be made to 
appear highly homogeneous, that is, to consist largely of scores of about the 
same magnitude. The very same distribution, on the other hand, may be 
made to appear highly heterogeneous, that is, to consist of scores varying 
widely in magnitude, by the use of a small physical distance to represent a 
unit of relative frequency and a large distance to represent a score unit. 
Figures 2.14 and 2.15 illustrate what can be done to the appearance of а 


.20 


0.00 T 
140 145 150 155 160 165 
X-Scale 


rf-Scale 


Ficure 2.15 Histogram of relative frequency distribution of Table 2.12 
(with small-unit distance on rf-scale and large-unit distance on X-scale) 


histogram as a result of such manipulations. In each case, the relative fre- 
quency distribution pictured is the same—that is, the distribution of Table 
2.12. 

It should be clear, then, that if the variability of two groups of scores 
is to be compared by an inspection of polygons or histograms, not only 
must relative frequencies be used, but also the physical distances repre- 
senting score and frequency units must be the same for both graphs. 


46 


THE FREQUENCY DISTRIBUTION 


SYMBOLIC 
REPRESENTATION OF DATA 


3.1 INTRODUCTION 


. The symbolice notation of statistics makes it possible to state and 
Ораз statist ical ideas more precisely M COEUR i ht 

dinarily possible with common words. The mastery of the notation and 
Of the rules governing its appli sation is the price that must be paid for this 
Superior mode of communication. Since mathematics provides the founda- 
tion for statisties it is to be expected that many of the symbols and rules 
Will be those of mathematics. Others are used, however, which are more or 
ess unique to statistics. This chapter is primarily concerned TR such of 
he latter aswould We particularly useful to the beginning statisties student. 
cfinitions are given for symbols and explanations of the rules governing 
‘clr application are provided. No knowledge of mathematics beyond a 


MSN RN d. Ne 
Sinning high school course 1 presumed. 


і far more concisely than is 


ANY COLLECTION OF MEASURES OR SCORES 

We shall consider first 2 notational scheme that will serve to represent 

R Collection of measures OF scores. Since such a generalized scheme must 

De ¢ ons es collections containing Varying numbers of scores, 
g 

also some 


49 n. 
Гн REPRESENTATION OF 


apäble gf represe ' ) 
9t represen! times used in this sense) to repre- 


We g со 
Shall use the symbol .\ (nìs 


47 


Sy 


MBOLIC REPRESENTATION OF DATA 


sent the number of scores involved. Since N represents counts of the num- 
ber of scores, it is clear that it is restricted to representing any positive 
integer. 


The individuals or objects measured will each be assigned an identifying 
number. The assignment will be in a purely arbitrary order with one indi- 
vidual being assigned the identifying Number 1, a second individual the 
Number 2, a third the Number 3, ete. The last individual will, of course, 
be assigned the Number NV, that is to say, the number represented by N. 
As a sort of general designation or identification, we shall use the letter Z. 
This letter, then, represents any integer from 1 to N inclusive. 

The score value for a given individual will be represented by an X to 
which that individual's identification number is affixed as a subscript. 
Thus, X; represents the score of Individual 1, X» the score of Individual 2. 
ete. The score of the last individual is represented by Xs, and the score of 
any individual by X; 


There are several ways in which the collection may now be represented. 
For example, we may write 


К.Х Р" (3.1) 
The dots in this statement should be read * 


‘and so on to.” An alternative 
representation is 


(21,2... №) (3.2) 
It should be noted that the choice of s 
arbitrary. Other letters than N 
in fact, often used. 


ymbols used in this seheme is purely 
‚ 4, and X would serve equally well and are, 


35.8. Expres 


ING COMPUTATIONAL RESULTS IN TERMS OF THE 
NOTATIONAL SCHEME OF SECTION 3.2 
It is now possible, within the framework of the symbolic scheme of the 
foregoing section, to represent the results of the application of certam 
computational operations to the scores of any collection. Thus, the sum 
of the V scores may be represented by 
Xi + Xe Xade Ny (3.3) 
To abbreviate this result further, 


| statisticians use the upper-case Greek 
letter sigma to indicate 


summation. "Thus, the above sum may be expressed 


on DX; G—1,2,..., №) (3.4) 
N 

2 X. (3.5) 
i=l 


The symbol, E, is a si 


р ) gn of operation in the sa 
are signs of operation. 


І ime sense that +, —, X, or + 
It is called a summation operator or summation sig! 


48 


SYMBOLIC REPRESENTATION OF DATA 


The expressions (3.4) and (3.5) are alternative methods of indicating the 
fact that all V scores are involved in this sum. Similarly, the sum of the 
squares of any collection of scores may be represented by 


or | AF FF Fy (3.6) 
Ww 
exe, @=\.®.---®) (2.7) 
or by Ж 
У Х?; (3.8) 
i241 


To illustrate this scheme, let us regard it as applying specifically to the 
Collection of scores given in Table 2.1. In this case V = 100 and 


x (i= 1, 2, +++, 100) = 132, 171, ---, 96 
The sum of the 100 scores in this particular collection is 


100 . = 
Y ху= 1324- 171 + ++ ++ 96= 11,538 


i=l 
The sum of the squares of these scores is 


100 


Y Хъ= 1822+ mee + 96° 
i=l 7 god + 20,241 +++ + 9.216 = 1,427,186 


Or, if we are concerned with the subsum or subtotal of only the second ten 
Scores in this particular collection, we could write 


Ў y = 1264+934 4-86 = 1218 


ai 


This last example illustrates the need for а ейнай 
Values to be included in a desired sum. 1 he "rem T3 ме 
'у indicating the first and last values шше m , is В rin E i 
— 11 placed below and 20 placed a idis ı0 operator. 

«d to as the limits of the summation which 
ermediate scores as terms. It is a common 
hen all М values in a given collection 


are » . 
m designated by 7 
i tese values of i are referre 
"Wolves Vy, and Ngo and all int 
Practice not to designate the limits W ES 
are involved in a sum—that is, simply to 

EX: (3.9) 
values comprising a given colleetion. In 
ow the practice of omitting the limits of 
3.9). The occasional exceptions to 


1 be using (2 eee 4 i 
n which some ambiguity might otherwise 


to + 

? Tepresent the sum of all the 
8 HS book we shall generally foll 
Summation, that is, we shall be 


lis po; pom 
55 policy occur in situations ! 


49 


S À 
YMBOLIC REPRESENTATION OF DATA 


exist, or in which there appears to be something to be gained by directing 


g 
the student’s attention to the precise terms involved in a given sum. 


3.4 A SCHEME FOR REPRESENTING ANY FREQUENCY DISTRIBUTION 
Next we shall consider a seheme for represent ing any frequency distribu- 
tion. Such a generalized scheme must be capable of 
quency distribution with : 
of scores. 


representing a fre- 
шу number of classes and involving any number 
We shall represent the number of classes by the symbol e, and the 
number of scores by the symbol V, ах before. ‘The symbol e, like №, ean 
represent only positive integers. Each class will be assigned an identifying 
number. Again the assignment is arbitrary, but it is usually convenient to 
assign the Number I to the highest class, the Number 2 to the 
ete. The last, in this case the lowest, class will then be re 
We shall use the letter j to represent the identific 


next highest, 
presented by c. 


ation number of any class. 
Thus, j represents any integer from 1 to c inclusive. 


The score value corresponding to the midpoint or 
given class will be represented by 
for that class is affixed ax 


index value of a 
an X to which the identification number 
a subseript. The frequency for that class will be 
represented by an f with the class identification numbe 
seript. The sum of the class frequencies is, of course, equal to the number 
of scores in the entire collection, i.e., №. The complete scheme for repre- 
senting any frequency distribution is presented in Table 3.1. 


r affixed as a sub- 


TABLE 34 CLASS MIDPOINTS FREQU 
X Л 
А А : Хз Л 
Symbolic Representation of bo Ji 
Any Frequency Distribulion б 


An alternative and highly 
Xj, Sj 
2f=N 


abridged presentation is 


where 


{ (3.10) 
G= 1,2; «6 oy 


It should again be ol 


served that the choice i aie is arbi- 
К symbols used is arbi 
trary and that others wou FERNS NS MI n 


Id serve equally well, 


50 


SYMBOLIC REPRES 


"TATION OF DATA 


3.9 COMPUTATION IN TERMS OF THE FREQUENCY-DisTRIBUTION 
NoTATIONAL SCHEME 


It is now possible, within the framework of this notational scheme, to 
represent the result of the application of certain computational operations 
to any collection of scores organized into a frequency distribution. As was 
explained in Section 2.5, in carrying out such computational operations the 
scores in any interval are assumed to have the same value as the midpoint 
of that interval. To whatever degree the interval midpoints fail to repre- 
sent accurately the scores classified in the intervals, the results of computa- 
tions based on frequency distributions will fail to conform to the results of 
corresponding computations based on the original collection of ungrouped 
scores, 

In terms of the scheme under discussion, the sum of the fi scores in 
Class 1 is fi Xj, the sum of the fz scores in Class 2 is foX2, ete. Thus, the 
sum of the № scores involved in any frequency distribution may be repre- 


sented by 


fi Xi foXo t EX: (3.11) 
or 
>/;Х; (j= 1,2, -+ с) (3.12) 


or, if it is understood that all c products are involved, by simply 
XN; 


Similarly, the sum of the squares of the № scores of any frequency distribu- 


(3.13) 


lion may be represented by 


fi X + foX?2 4 AX. (3.14) 
or 
/;Х?, (51, 2, «63, 6) (3.15) 
or, simpl 
bd ZfjX?j (3.16) 


To illustrate, let us regard the scheme as specifically representing the 
frequency distribution in Part B of Table 2.5. In this case, c — 15 and 
N= 100. Also Х = 194.5; Х = 184.5, etc., while fi — 1; fo=0, ete. 
Hence, 


EfX;— (1)(194.5) + (0)(184.5) + + + + (2054.5) = 11,530 


and 

EfjX2 = (1)(194.5)2 + (0) (184.5)? + °°° + (2) (54.5)? = 1,426,945 
So that the approximate character of the equivalence of these sums to the 
Corresponding sums derived from the ungrouped scores will be clearly recog- 
nized, the relationships between these results as expressed in the symbols 
of their respective notational schemes are given below. The limits of sum- 


51 


SYMBOLIC REPRESENTATION OF DATA 


mation have been retained here to direct the student’s attention to the fact 
that these approximately equivalent sums do not involve the same number 
of terms. The sign, ~, used in stating these relationships should be read 
"js approximately equal to.” 


Pate Ea (1T) 
j=l bi=l 
[11,530 = 11,538] 
e N 
Tas a > AS. (3.18) 
j=l i= 


3.6 Some SIMPLE RULES REGARDING THE SUMMATION OPERATOR 


In this section we shall consider some simple rules regarding the sum- 
mation operator. These rules will prove extremely useful to the student 
interested in following some of the derivations presented in later chapters 
of this book, as well as in any general reading he may do on the subject of 
statistics. They are stated in terms of the symbolic scheme for representing 
any collection of scores (see Section 3.2). 


Reve 3.1. The application of the summation operator, X, to the products 
resulting from multiplying the scores of any collection by а constant multiplier 
is the same as the product of this constant times the application of X to the 
scores. Or symbolically, 

ZCX; = CEN; (3.19) 


That C represents a constant is indicated by the fact that no subscript is 
affixed to it. 


Example. It will prove helpful to the student to verify this rule in the 
case of a specific example. Consider the following collection of six scores 
(here № = 6): 


X123 X42 10 
X321 X;= 
X3=7 Xe=6 


Now let C= 2. Then 


Хох, = (2003) (2)(1) + (2)(7) + (2)(10) + (2)(3) + (2)(6) = 60 
and 
IX = 203 1-2- 7 3- 104346) = (2)(30) = 60 


Proof. According to the definition of the summ 


ation operator, the 
left member of (3.19) may be written P 


52 


SYMBOLIC REPRESENTATION OF DATA 


TON; = CX, 4 CX2+ CN3+---+CXy 
Now factoring by removing the common factor, C, we obtain 


DCX = TOG + Ket Asche К) 


And using the operator, Y, to express the quantity in the parentheses, 
we have 

SCN += CEA; 
which, of course, is the equality we wished to establish. 

Reve 3.2. Given two or more scores for cach member of a group of N 
individuals. The application of the summation operator, X, to the algebraic 
sums of each individual's two or more scores is the same as the algebraic sum 
of the results of applying X to the separate collections of scores. Or symbolically, 

ya Ex РРР wes Р 
DIN + Yi - Z) = ХХ + EY: XZi (3.20) 

Example. To verify this rule in the case of a specifie example, consider 

the following three collections of scores, each of which involves the same 


group of four individuals: 


X123 Y,=1 A, 2-3 
hoo PLI! Za=5 
X323 Үз=? Z3= 2 
Area з= =5 


Then 
SKE Fa Z)02 Q41—3) 7+4-5) + (8+2— 2) + (3+3 5) 
—04-64-341 


= 10 

And 

INGS = BEE FE 34-3) + (1+ 4+2۳3 — 3+5+۳ 245) 
= 15+ 10—15 
= I0 


Proof. According to the definition of the summation operator, the 
left member of (3.20) may be written 
Dey PZ) = Ot Т ИНЬ Get Ye Z2) bee 
+ (Ny + Vy Ах) 


Now, simply rearranging and grouping ferms, we have 


ARE у, RS RE Ee EAT Үл Ea f +++ 
کے و وو کے ب اد‎ 
ed t Ма ХМ АРТ Yael Ка) 
A E A А) 


niox OF DATA 53 


SYMBOLIC REPRESENT 


And using the operator, 7, to express the З quantities in the parentheses, we 
obtain 
Б(Х + ¥i-Z) =TN + ХҮ: DZ 


which is the equality we wished to establish. 

Ree 3.3. The application of the summation operator, E, to N values of 
some constant is the same as the product of N times this constant. Or sym- 
bolically, 

XC-NC (3:21) 

Proof. Notethat 

SCO=C+CH---+C (ог № terms) 

But the sum of N C's is the sume as № times C. Hence, 

2C'=NC 


In statistical work the application of these rules often occurs in com- 
bination. Hence we shall conclude this section with several examples 
illustrating their joint application. 


EXAMPLES 
1. Fora collection of N values of X show that 


PAE OBR, NE 


Solution: 
5(Х,—С)=®Х,—7С (by Rule 3.3) 
= Ух, NC (by Rule 3.3) 


2. For a collection of N pairs of values of X and Y show that 
®ЖХ(Ү+а)=®Х;Ү;+а®Х; 


Solution: 


ZX(Yida)- I(X;Y; + aX) (Carrying out the 
indicated multiplication) 
= IN Yi + Хах, (by Rule 3.2) 
SIN Yi + alX; (by Rule 3.1) 


3. For a collection of k values of W show that 


Z(aW;— b)? = @DW2; — 2abZ Wy + kb? 
Solution: 


Z(aW,;— b)? = Z(a2W?; — 2abl; + 02) (squaring) 

= EXa?W?, — E2abW;4- Eb? (by Rule 3.2) 
a?2ZW?, — 2abZ W; + Db2 (by Rule 3.1? 
—a?XW?;— 2abZW;-F kb? (by Rule 3.3) 


54 


SYMBOLIC REPRESENTATION OF DATA 


3.7 REPRESENTATION OF A RELATIVE FREQUENCY DISTRIBUTION 


In Section 2.9 it was observed that frequencies are sometimes reported 
as fractions of the total number of scores involved. To represent any such 
relative frequency distribution we shall employ the same scheme as was 
used with an ordinary frequency distribution, except that we shall represent 
the relative frequencies by pi, ро, °° °, Pe That is, if J represents any class 


identification number, 


2) 


Г 
рр ke (3.2 


The complete scheme is shown in Table 3.2. 


rn Crass MIDPOINTS RELATIVE 
TABLE 3.2 FREQUENCIES 
Ха m 
Symbolic Representation of Any Хз pa 
Relative Frequency Distribution Хз ps 
Же ре 


Or if we use the form of (3.10) we have 


Ху, 7 G= 1, 2; «6 (3.23) 
distribution the sum of the c relative fre- 


In any relative frequency 
follows: 


quencies is 1. This may be demonstrated as 


But x Henee, by Rule 3.1. 


is à constant. 
35 = (=! [see (3.10)] 


1 
Sy = 
2pi—N 
N j=l 


TERMS OF THE RELATIVE 


RESULTS IN H 
а OTATIONAL SCHEME 


3.8 COMPUTATION. x 
TRIBUTION + 


Freqvency DI 

shall present ssi 
i: f the original | 
scheme. These expressions ате equiv- 


I expressions for the approximate sum 

n this section w erates METIS of herel- 
p aes ÊÊ i 

n of squares 


and; " 
Nd approximate sur $ 
ational 


dg frequency distribution notati 
ent to those of (3.13) and (3.16). 


First note that by (3-22) : 
: ; = УР (3.24) 


55 


DATA 


Sy и 
MBoLIC NTATION OF 


REPRE 


Now, beginning with (3.13) we have 
LANG = UN DX; 


And applying Rule 3.1 we obtain 


Ху = NZX; (3.25) 
Similarly, beginning with (3.16) it may be shown that 
BDA = NIN; (3.26) 


3.0 A SCHEME FOR REPRESENTING A COLLECTION OF SCORES 
ORGANIZED INTO SUBGROUPS or SUBS 


TS OF SCORE 


Not infrequently the data involved in a statistical study or investiga- 
tion are organized into two or more subgroups or subsets. For example, 
data collected for the purpose of studying sex differences in achievement in 
some school subject, say, arithmetic, fall naturally into two subgroups, one 
consisting of measures of the level of achievement of boys and the other 
consisting of similar measures of the achievement of girls. Or data may be 
collected for the purpose of comparing the relative effectiveness of four 
methods of memorizing a poem. (For example, one method might consist 
of learning one line at a time, another of learning one sentence at a time. 
another of learning one verse at a time, and another of learning two or more 
verses at a time.) These data might consist of measures of the times re- 
quired by the individuals studying under each method to become word 
perfect in two successive recitations of the poem. In this situation, the 
data— that is, the time seores—fall naturally into four subgroups, one for 
those learning the poem by Method 1, a second for those learning the poem 
by Method 2, ete. 

In setting up a symbolic scheme for this situation we shall identify each 
individual or object by two identifying numbers. The number written first 
will identify the subgroup to which the individual belongs. The second 
number will identify the individual within the subgroup. The first sub- 


group (the identification of a particular subgroup as the first subgroup is 


purely arbitrary) will be assigned the identifying Number 1, the second the 
Number 2, ete. If there are in all k subgroups, the 
identifying number represented by k. 
use the letter j. r 
inclusive. 


last will be assigned the 
eI As the general designation, we shall 
The letter j, therefore, represents any integer from 1 to Ё 

We shall use the letter n to represent the number of individuals in 2 
subgroup. If the subgroups are all of the same size, no reference to a sper 
cific subgroup is needed in connection with the use of n. On the other hand, 
il the subgroups are made up of varying numbers of individuals or objects: 
it will be necessary, in representing the number of objects in d subgroups 


56 


SYMBOLIC REPRESENTATION OF DATA 


to identity the subgroup concerned. To do this we shall affix the subgroup 
identifiention number as a subscript to n. Thus zı represents the number 
of individuals in Subgroup 1, лә the number of individuals in Subgroup 2, 
ete. "Phe number of individuals in the last subgroup will be represented by 
n. and the number in any subgroup by пу. 

Besides identifying the group to which an individual belongs, it is 
necessary to distinguish him from the other individuals belonging to the 
same group. As has been suggested, this will be the function of the second 
identifying number which we shall write in the second position, that is, 
following the group identification number. Thus, the first individual* in 
the first subgroup will be identified by the two numbers (digits) 11 (read 
^). the second individual in the first subgroup by the 


"one one” not "eleven? 
Since there are nı individuals in all in the 


two numbers (digits) 12, ete. i 
first subgroup, the last individual in this group would be identified by 11. 


Of course, if all subgroups contain an equal number of individuals there is 
no need to attach an identifying subscript to the group number т. In this 
case the last individual in this group would simply be identified by In. As 
the general designation for an individual we shall use the letter 7, so that 
any individual in Subgroup 1 may be designated by 17. In this scheme, the 
first individual in the second subgroup is identified by 21, or the fourth 
individual in the third subgroup by 34. Any individual in any subgroup is 


designated by ji. 


Symbolic Representation of Any Collection of Scores 


Organized into Subgroups 


TABLE 3.3 


SUBGROUP 
а [ : 
Xa Xa 
Xr 
А Хы 
Xing 
CUM 
rd "ИЧЕ, а ‘ 
Mh individual 2$ the first individual in a given subgroup 
idest i [ „ticular individus? * 
їз Би designation of a partic ular 1 
Urely arbitrary. 


57 


8 АТА 
YMBOLIC REPRESENTATION OF DATA 


As before, we shall represent the score value for a given individual by 
an X to which that individual's two identifying numbers are attached as 
subscripts. Thus, Ху represents the score of Individual 1 in Subgroup 
1, X34 the score of Individual 4 in Subgroup 2, Xj, the score of Individual 
т. in Subgroup k (i.e., the last individual in the last subgroup), and Ху; 
the score of any individual in any subgroup. 

There are several ways in which the entire collection may now be repre- 
sented. One of these is shown in Table 3.3. The fact that the group sub- 
scripts have been affixed to the group n’s implies that the number of indi- 
viduals in one group may differ from that in another. 

A second method of presentation involves the application of (3.2) to 
each subgroup, thus: 


Xu (051,2, --., m) 
Хэ; (@= 1,2, +, пә) 

е = 
Xj: @=1, 2, +++, т) (3.27) 
Хы @= 1,2,.--,т) 


A third method of writing is 


Xj (G=1, 2, +++, k; 491,2, m) (3.28) 


This notation implies that for each value of j, the letter z takes in turn 
the values 1, 2, -- +, nj. That is, while j remains 1, į takes in turn the values 
1, 2, °°, nı, and while j remains 2, i takes in turn the values ly s 


иэ, 
ete. 


, 


3.10 COMPUTATIONAL RESULTS IN TERMS OF THE ScukwE or Section 3.9 


Suppose that it is required to determine the sum of all the scores in the 
entire collection. To attack this problem systematically we shall first ob- 
tain the subtotal for each subgroup. Then the required grand total may 
be obtained by combining these group subtotals. Thess group subtotals 
may be represented by (3.5). In writing them we have specified the limits 
of summation to direct attention to the fact that the sum for one. group 
may involve a different number of terms than that for another OUD: 


58 


SYMBOLIC REPRESENTATION OF DATA 


> Ху; = subtotal for Group 1. 
i=l 


У) Xo; = subtotal for Group 2. 


Xj; = subtotal for Group j. 


ng 
> Xj; = subtotal for Group k. 
t=1 
Now we shall introduce a second summation operator to indicate the 
summation of these k subtotals. That is, we shall write 


nj nk 


k n; n п, 
2 хи Xu pen а У Хы Gam 
que i= i= i= i= i= 

Similarly, the sum of the squares of all the scores in the entire collection 


may be represented by 


E 
У Eu (3.30) 
j=1 i=l 
It should be noted in the special case in which the groups are of equal 
ame number of individuals, that the group 


size, that is, contain the s 
ped from the n. In this case, then, the 


identification subscript may be drop 
sum of all the scores may be represented by 


> Xi (3.31) 


and the sum of the squares of all the scores by 


n 


k 
У > x (3.32) 
j=1 i=l 
venient in this scheme to use the upper case N to represent the 


Tt is con р 
ection. That is, 


number of scores in the entire coll 


м= тт + byt 


or 5 
N = У, mi (3.33) 


j=l 


; oF DATA 59 
SYMBOLIC REPRESENTATION OF DAT. 


In the special case in which the groups contain the same number of 
individuals 


Or, applying Rule 3.3, 
N= kn (3.34) 


To illustrate, consider the following collection, 
groups of scores: 


Which consists of three 


Group 1: 3, 8, 4 
Group 2: 7, 2, 5, 6, 5 
Group 3: 1, 9 


In this collection Xj; = 3, Хоз = 5, ete. Here k= 3, 11 — 3, по — 5, and 
тз = 2. The sum of the scores in the entire collection is 


3 ny 3 5 2 

PS ty Set he Sty 

j=1 i=l i=l i=l i=l 

| = (ЗВ 25405) (149) 
= 15 25 +10 
= 50 


The total number of scores in this collection is 


3 
N= n= nt ntn = 542-10 


j=l 


3.11 THE SITUATION IN Wuicn 7 


"wo OR MORE MEASURES 
ARE AVAILABLE Fi 


OR EACH INDIVIDUAL 


Suppose that a test of achievement in language skills at the eighth- 


grade level consists of several distinet parts -suy, for example, parts dealing 
with spelling, punctuation, capitalization, and ux: 
being derived from each part. 
measures or scores available for each individual tested as there are distinct 
parts of the test. In this section we shall present a notational scheme for 
such a collection of data. 

As before, we shall identify each of the n in 
integers, 1, 2, - 5 ^, and we shall use î to repre 
The m separate tests will be identified by the integers 1,2,+++,m, As the 
general designation for the individual tests we shall use the letter 7. That 
is, j represents any integer from 1 to m inclusive. The value of a store ona 
part. will be represented by the letter X. 


we —with a separate score 
In this situation there will be*as many 


idividuals by one of the 
sent any one of these integers. 


To indicate the score m 


ade оп a specific part by 
we shall use two identify 


й a specifie individual 
ing numbers, 


one to designate the part and the 


6o 


SYMBOLIC REPRESENTATION OF DATA 


other the individual. We shall, as before, affix these numbers as subscripts 
to the W's, writing the number identifying the test part in the first position 
and the number identifying the individual in the second position. Thus, 
the score made on Part 1 by Individual 1 would be represented by Ху, on 
Part 1 by Individual 2 by Xi», on Part 1 by Individual n by Xin, on Part 2 
by Individual 1 by Ху, and so on, with the score on the last part by indi- 
vidual л being represented by N mr- 
The entire collection is presented in Table 3.4. 


TABLE 3.4 Symbolic Representation of a Collection Involving 
m Scores for Each of n Individuals 
Parts 
INDIVIDUALS е 
1 V) tee J did m 
1 Xu Xn ee Xj 6% Xm 
2 Хғ N22 ae Xi T Ana 
i Xu Xs: "s Xs ен Xni 
n X31 Xon EE Xjn Xin 


s "T s "HE NOTATION 
3.12. COMPUTATIONAL RESULTS IN TERMS OF THE NOTATIONAL 


SCHEME OF SECTION 3.11 


It is often necessary in a situation of the type under consideration to 
represent the sum of the scores made on the separate parts by a single 


individual. ‘These sums may be represented by an application of (3.5). 
In the case of Individual 1, we have 
m " 
pa Ха= Ku Xatt xX FoF Xa 
j=l 
Or in the case of Individual 7, that is, any individual 


ў Nya Xu Morte + Fy PF + Хы (3.35) 
y=? 

We may also apply (3.5) to the problem of representing the sum of the 

Scores made by all n individuals on one of the separate parts. Thus, the 


individuals art 1 may be indicate 
sum of the scores made by the n individuals on Part i indicated by 


61 


SYMBOLIC REPRESENTATION OF DATA 


n 


È Xos Xant Хоф: set Ant + oP in 


t=1 


or the sum of the n scores on any one part, say, Part j, may be indicated by 


< „өү 
2 Xi = Xat Xt et Xue Xa (3.50) 
-1 

Note that the sums represented by (3.35) are the sums of the scores in 
the rows of Table 3.4. It is possible to represent the sum of all the seores 
in the entire collection, that is, of all the scores made by all the individuals 
on all the separate parts, by applying (3.5) to these row sums. This gives 
п т т 
> 


1jz 


m m m и 
Xe=DXatDXete +E Nt УХ, (37) 
i 1 j=1 ј=1 j=l j=l 
It is, of course, also possible to find the sum for the entire collection by 
summing the column totals. This sum may be represented by applying 
(3.5) to the column sums as represented by (3.36). This gives 


D D х= Xut Xatt ху УХ, (338) 

j=l 4-1 i=l i=l i=l i=l 

Since the sum for the entire collection is the same, regardless of whether 
it is computed by totaling the row sums or the column sums, it follows that 
the result of (3.37) is the same as that of (3.38). That is, 


n m m n 


> У Хау CY (3.39) 


i=l j=1 j=l i=l 

Verbally (3.39) simply states that in T 
equals the sum of the column sums. 
collection both the sum of the row sums (3.37) and the sum of the column 
sums (3.38) should be obtained as a cheek against possible errors in addition. 

Finally, it should be noted that since th 
rows, the total number of scores, 


able 3.4 the sum of the row sums 
In obtaining the sum for the entire 


ere are m scores in each of n 
N, in the entire collection is given by 


N= m=nm [see Rule 3.3] (3.40) 
i=l 
That is, since in this scheme there 
the total number of scores in the entir 


are n rows each containing m scores, 
ге collection is given by the product 
of n times m. 


To illustrate the computati 


onal results which have been derived in this 
section, we shall use the foll 


à 1 owing collection of scores made by seven indi- 
viduals on a test involving four distinct parts. The data are presented in 
Table 3.5. Here m= 4 and n= 7. Reference to this table shows that the 


score value represented by Xii is 10, by X12 is 13, by Xi; is 8 by X32 is 5, 
by X46 is 14, ete. ^ 


62 


SYMBOLIC REPRESENTATION OF DATA 


TABLE 3,5 Scores Made by Seven Individuals on a Language 
Test Involving Four Subtests in Spelling, Punctua- 
tion, Capitalization and Usage 


ERE SUBTESTS COMPOSITE 
NDIVIDUALS 1 2 3 4 Score 
1 10 5 7 8 40 
3 18. 2 5 и 40 
3 4 6 3 7 20 
4 21 7 u 28 62 
5 т 10 8 9 34 
6 16 14 10 14 54 
7 8 7 6 9 30 
79 80 50 71 280 


Applying (3.35) in the case of Individual 1, i.e., Row 1, we have 


4 
Y Ха= 10+ 15+7+8= 10 
or in the case of Individual 5, we have 
4 
У. Хљ=7+10+8+9= 34 
Applying (3.36) іп the case of Subtest 1, we have 


Xu 10-4- 134- 4+21+7 + 164-8-— 79 


i=l 


or in the case of Subtest 3, 


Ў Xu=74+5+3+11+8+10+6=50 
i21 
The sum for the entire collection as given by (3.37) is 
4 


4 4 4 4 ٩ 
> XX DD CC хик e 
t=1j=1 j=l j=l = j= j= j= jl 


= 40+ 410+ 20 + 62+ 34+ 54 + 30 = 28 


Similarly, the sum for the entire collection as given by (3.38) is 


4 7 7 >: 7 - 
2n Xs Xu £X x 3i t + DUN 


j=1 i=l 
= 79 + 80 + 50 4- 71 = 280 


SYMBOLIC REPRESENTATION OF DATA 


one whose height is 67 inches. The second type of comparison cited, чон 
makes it possible to state meaningfully that Object A is, say. 25 times as 
heavy as Object B, implies not only a constant unit but measurement Гата 
absolute zero as a reference point as well. Thus, а 50-pound sack of flour 
is 2.5 times heavier than a 20-pound sack of flour. If, however, the reference 
point of the measuring scale had arbitrarily been placed at 10 pounds in- 
stead of at zero pounds, the two sacks of flour would be reported as having 
weights of 40 and 10 pounds respectively. While the difference between 
the measurements remains unchanged (30 pounds), it is obvious that the 
ratio method of comparison is no longer valid. That is, it cannot validly 
be stated that the first sack is 4 times as heavy as the second. 

There is a further important aspect of a fundamental measuring scale 
which is referred to as the character of additivity. This means that the 
attribute or property involved must be such that two objects possessing it 
can be combined to form a third object which will then possess this same 
property in an amount equal to the sum of the amounts possessed by its 
two component objects. For example, if we combine the 20- and 50-pound 
sacks of flour into a third sack, we know its weight will be 70 pounds. 

But it is not so much our purpose to develop the concept of a funda- 
mental measuring scale as it is to point out that a great many of the measure 
ing scales with which we deal do not possess the characteristics of funda- 
mental measuring scales, This is true of the scales employed in many areas, 
and it is particularly true of the scales employed in educational and psyeho- 
logical measurement. In the measurement of educational achievement, for 
example, a test score usually represents the number of items or exercises (0 
which the person tested has made the response regarded as correct. Thus, 
if a pupil makes a score of 80 on a 150-word spelling test, this score indicates 
that he has spelled 80 of the words correctly. The meaningfulness of this 
score depends, of course, upon the range and distribution of difficulty of the 
words constituting the test. If the test contains 100 very easy words, this 
score does not necessarily mean that the individual making it is a very 
good speller. On the other hand, if the test consists exclusively of very 
difficult words, a score of 80 may re 

We shall now examine a sc 
against the criteria which we 
mental measuring scale. The 


present a remarkable performance. 
ale such as is represented by th 


spelling text 
have described as characteristic of a funda- 
meaning of differences between pairs of scores 
on this spelling test, like the meaning of a single score, depends upon the 
range and distribution of difficulty of the items. Suppose that Pupil А 
spelled 30 words, B spelled 60, and C spelled 90 words correetly. Suppose: 
further, that the test contains 70 very easy words and 70 very difficult 
words, with only 10 words of intermediate difficulty. In this ease, the differ- 
ence in spelling ability between C and B would probably be greater than that 
between B and A, since A and B might both have been able to spell only 
very easy words, while C was able to spell some of the very difficult words 


66 


PERCENTILE RANKS AND PERCENTILES 


On the scale of scores for this test, then, the “unit” employed would repre- 
sent larger amounts of spelling ability at some points than at others. 

Similarly, a score of zero on a test of this kind would have no absolute 
significance. If a pupil fails to spell any word in a spelling test—that is, if 
he makes a score of zero—it obviously does not follow that he has absolutely 
no spelling ability. There may be words that he can spell which were simply 
not included in this particular test. Consequently, it is not meaningful to 
say that B, who spelled 60 words, has two times as much spelling ability as 
A, who spelled 30 words. That is, since the "units" fluctuate in a more or 
less unknown way at different points along the scale, and since a zero score 
has no absolute significance, it is not validly possible to compare individual 
scores by either the difference or the ratio methods. 

Moreover, the characteristic of additivity is lacking in such a scale. 
Suppose, for example, that the test contains 75 easy words and 75 difficult 
words, and that Pupils E and F each scored 75. It is not likely, under these 
circumstances, that the combined efforts of E and F would result in a score 
of 150, or, for that matter, in a score very much greater than 75. 

It is apparent from the foregoing examples that a single score such as 
is derived from most educational and psychological tests has little, if any, 
absolute significance—that is, it is not capable of meaningful interpretation 
when considered alone. Scores on such tests usually have rank-order mean- 
ing only; that is, they are ordinarily useful only in determining an indi- 
vidual's probable rank in a given group. Such tests, if reliable and valid, 
enable us to determine whether A is likely to possess more of the trait or 
property in question than B or C, but not how much more or how many 
times more. The fact that a given pupil has made a score of 70 on a test in 
United States history, for example, in itself tells us nothing about the 
quality or magnitude of his achievement. In order to interpret this per- 
formance we must not only be intimately acquainted with the test itself, 
but must know what scores have been made on the same test by other 
pupils їп а group to which the given individual belongs, and we must know 
something about the nature of that group, that is, whether it js made up of 
college or high school or elementary school pupils; what kind or amount of 
instruction they have had; the level and range of their intelligence; and 
so on. Measuring scales which are thus limited to the determination of 
the rank of an object or individual in a specified group are known as rank- 
order scales, Our attention in this chapter will be centered upon one im- 
portant technique which will facilitate the interpretation of measurements 
derived from rank-order scales. 


It is also important to note that this technique may be useful with 
Measures derived from f undamental as well as from гап k-order scales. Sup- 
Pose, for example, that the height of a certain individual is 52 inches. It is 
true that without seeing this individual we have some conception. of his 
height. "This conception is, of course, based upon our familiarity with the 


PERCENTILE RANKS AND PERCENTILES 67 


inch as a unit of measurement. Until we сап obtain additional information 
about this individual, however, we are not in a position to draw any very 
meaningful conclusions about his height. We don't even know whether 
heisshortortall. If we are now told that he is an adult white male we know 
at once, because of our familiarity with the normal heights of adult white 
males, that he is extremely short. Suppose, however, we are told that he is 
a nine-year-old Canadian boy. Unless we are quite familiar with the dis- 
tribution of heights for such boys we still do not know whether he is short, 
medium, or tall in stature. To be able to deseribe him in this way, we would 
need to know at least whether the height 2 
or higher third of such a distribution. 

In other words, to know that an individual's height is 52 inches may, 
by itself, be only slightly more informative than the knowledge that an 
individual's score on a. United States history test is 70. Neither measure 
alone tells us whether he is short or tall or poor or good. Information of 
this latter type requires not only some description of the individual but also 
a familiarity with the distribution of heights or test performances for other 
individuals of the same type. That is, we must be able to place or rank the 
given measure within a collection of such measures in order to characterize 
the individual fully, and this applies regardless of whether the given measure 
is derived from a fundamental or a rank-order scale, The techniques to be 
considered in this chapter have to do with the placement or ranking of a 
given score in a collection of such scores. 


inches falls in the lower, middle, 


4.2 PERCENTILE RANKS: DEFINITION 


In view of these problems in the interpretation of measurements, it is 
essential that we have some means of deriving, from the original, or raw, 
scores, other scores which are directly indicative of the rank or placement 
of each raw score in a collection or distribution of such scores. A device 
which first comes to mind for such a purpose is simply that. of determining 
the rank of each score in the collection of scores in which it is found. The 
rank of a score indicates its position in a series of scores formed by arranging 
all scores in order of magnitude. Thus, a rank of 30 for a given score would 
indicate that the score is 30th from the top (or bottom) when all scores have 
been arranged in order of size. 

The meaning of such a rank score, 
considerable extent upon the number of scores in the series. To rank 30th 
in a group of 40 clearly does not mean the same thing as to rank 30th in a 
group of 400. This difficulty may be overcome to al 
the rank of a score in relation to the total humbe 
The customary practice is to report the rank of 
percentage of scores in the entire collection whie 
score. Ranks thus reported are known as pe 


however, obviously depends to a 


arge degree by stating 
r of scores in the series. 
à score by stating the 
h are smaller than this 
rcentile ranks, 


68 


PERCENTILE RANKS AND PERCENTILES 


DEFINITION. The percentile rank of a given point on a score scale is the 
percentage of measures in the whole distribution which are below this given 
point. 


It will be noted that the definition of percentile rank has been stated 
with reference to a point on the score scale. It will be recalled that it has 
previously been observed that most of the traits studied in education and 
psychology are matters of growth and development, and as such can be 
regarded as continuous variables. In fact, it has been agreed that insofar 
as the techniques presented in this book are concerned, all data will be 
treated as continuous, regardless of their true character. It is for this reason 
that the definition has been stated with reference to a score point. It is 
hecessary, however, that we examine this aspect of the definition carefully 
so that its implications will be fully appreciated. 

Assuming, in keeping with psychological and educational test theory, 
that a raw test score represents, in terms of the particular test scale, a 
measurement taken to the nearest unit, it follows that the actual value of 
an individual's test score should be interpreted as corresponding to some 
scale point between one-half score unit below and one-half score unit above 
his obtained raw score. The exact location of the scale point within these 
limits is, of course, unknown. Suppose now that it is required to determine 
the percentile rank of the score or unit point 33 on a certain test scale. 
Suppose, moreover, that a particular individual has earned а raw score of 
33 on this test. We shall further assume that the series involves a total of 
50 raw scores, of which 40 are below 33 in value, and that this particular indi- 
vidual's score is the only one in the collection having the reported value 33. 
Now the exact location of the scale point representing this individual's test 
performance is known only to be somewhere in the interval from 32.5 to 
33.5. If it falls in the lower half of this interval (i.e., between 32.5 and 33), 
then the percentile rank of the score point 33 is § 82 (i.e., 41 expressed as E 
percentage of 50). On the other hand, if it falls in the upper half of this 
interval (i.e., between 33 and 33.5), then the percentile rank of the score 
Point 33 is 80 (1.е., 40 expressed as a percentage of 50). Since its actual 
location is unknown and just as likely to be in one half of the interval 32.5- 
33.5 as in the other, we shall arbitrarily choose a value half- way betw een 
these two extreme possibilities (80 and 82) as our estimate of the percentile 
rank of 33. That is, we shall report 81 as the percentile rank of this indi- 
vidual's score of 33. It will be noted that 81 represents 40.5 expressed asa 
Percentage of 50. In a sense it represents an arbitrary compromise arrived 
at by treating the individual's score of 33 as being split evenly between the 


two halves of the interval 32. 

This arbitrary compromise or convi ention may also be extended to 

apply in situations involving the determination of the percentile ranks of 
situa 


а ins Ws is 
Score points where more than one individual obtains a raw score cor. 


69 


PERCENTILE, RANKS AND PERCENTILES 


responding to the score point. Suppose that in the foregoing example 3 
individuals instead of one had made raw scores of 33. Applying the con- 
vention in this situation amounts to treating these three scores as though 
they were evenly spread throughout the interval 32.5 to 33.5. This means 
that one-half of these three scores, or 1.5 scores, are regarded as falling in 
the lower half of this score interval (i.e., between 32.5 and 33). Hence, a 
total of 41.5 (i.e., 40 + 1.5) scores are regarded as falling below the score 
point 33, and the percentile rank of this point in this situation ix taken to 
be 83 (i.e., 41.5 expressed as a percentage of 50). It should be noted that 
percentile ranks of score points determined in accordance with the above 
convention are necessarily only estimates of the true percentile ranks of 
these points for the given group of individuals. 


4.3  PrEncENTILES: DEFINITION 


A percentile is the inverse of a percentile rank. Whereas the percentile 
rank of a particular score point is the percentage of scores falling below this 
point in the ordered series of scores, the value of this point itself is the per- 
centile corresponding to this percentile rank. Thus, the 90th percentile is 
the point on the score scale below which 90 per cent of the scores fall. The 
percentile rank of this point is 90, but the particular value of this point 
itself is the 90th percentile. 


DEFINITION. The xth percentile of a given score distribution is the point 
on the score scale below which x per cent of the scores fall. 


It is important to distinguish carefully between the terms percentile 
and percentile rank. The percentile rank of a given score is the number 
representing the percentage of scores in the total group lying below the given 
score point, while the percentile is the score point below which a given per- 
centage of the scores lie. The 28th percentile in a certain distribution might, 
for example, be 112 pounds, but the percentile rank of an individual of this 
weight—that is, of the score point 112—in this distribution is 28. 


4.4 NOTATION AND SPECIAL PERCENTILES DEFINED 


Percentile rank is commonly represented by % 
PR. In this book the latter notation (PR) will b 
percentile may be written rth %-ile or P. The latter, that is, the upper- 
case P with a numerical subscript indicating the particular percentile 
involved, will be used in this book. Thus, the symbol Pso indicates the 
value of the 90th percentile. 

There are certain percentile points which are of sufficient special im- 
portance to warrant being designated by special names and symbols. The 


-ile rank, %-ile rk, or 
e employed. The ath 


70 


PERCENTILE RANKS AND PERCENTILES 


nine percentile points which divide the distribution into ten equal sets of 
scores are known as deciles. The decile point below which 10 per cent of 
the scores fall (i.e., the 10th percentile, Ро) is known as the first decile and 
is designated by the symbol Dı. The decile point below which 20 per cent 
of the scores fall (i.e., the 20th percentile, P20) is known as the second decile 
and is designated by the symbol Dz, ete. 

The three percentile points which divide the distribution into four equal 
sets of scores are known as quartiles. The quartile point below which 25 
per cent of the scores fall (i.e., P23) is known as the first or lower quartile and 
ix designated by the symbol Q1. The quartile point below which 50 per cent 
of the scores fall (i.e, P350) is known as the second or middle quartile and is 
designated Qə. The quartile point below which 75 per cent of the scores 
fall (i.e, 275) is known as the third or upper quartile and is designated Qs. 
ntile point which divides the distribution into two equal sets 
of scores, that is, the point below and above which 50 per cent of the scores 
lie, is known as the median. It is variously represented by the symbols 
Adn, Me, Mn, and Md. In this book the first of these (Mdn) will be em- 
It should also be noted that the median is the equivalent of both 
(Ds) and the second quartile (Q2). These special percentiles 


The perce 


ployed. 
the fifth decile 
are summarized in Table 4.1. 


TABLE 4.1 Special Percentile Points 
NAME SYMBOL PERCENTILE 
First Decile Di Pio 
Second Decile De P20 
Third Decile Ds P30 
Fourth Decile Di Ро 
Fifth Decile Ds = Qi = Mdn Рза 
Sixth Decile 1% Pso 
Seventh Decile D; Pro 
Eighth Decile Ds А 
Ninth Decile Da Poo 
First (or lower) Quartile Qi Pas 
Second (or middle) Quartile a = Ds = Mdn Sh 
Thir . ў artile з 75 
dus ша Мап = Ds = Qs Ро 


ctions were placed on the value of x except that 

it lie between the limits of 0 and 100. Thus, if x= 7, P. (i65 Pr) represents 

the score point below which 7 per cent of ihn 56006903 ше distribution lie. 

If x= 14,73, P; (i65 P473) represents the score point below which 14.73 

per cent of the scores lie. Usually, however, we E pie: only in those 

values of P Shite zis some integer from 1 to 99. Just as the nine points 
7 r 


In defining Pz, no restri 


LES 71 


Р SRCENTI 
PERCENTILE RANKS AND PERCE 


which divide a distribution into 10 equal sets of scores are called deciles, 
the 99 points which divide a distribution into 100 equal sets of scores are 
referred to as centiles. The point below which 1 per cent of the scores in the 
distribution lie is called the first centile and is designated by the symbol (1. 
The point below which 2 per cent of the scores lie is called the second centile 
and is designated by Co, ete. Obviously, Cy is the equivalent of Pi, Co of Pa, 
ete. Centiles, then, are those special P, points for which w takes the values 
of the integers 1, 2, 3, « - -, 99. 

It is important to note that special percentiles (deciles, quartiles, 
medians, eentiles) like all percentiles, are points on the score seale and are 
not, as is sometimes mistakenly thought, intervals along this seale. Ocea- 
sionally one hears an individual referred to as being "in" the first or lower 
quartile of a particular group on some test when it is intended to indicate. 
rather, that he is in (or among) the lowest one-fourth of this group. The 
lower quartile is a point on, and not an interval along, the score scale and 
it is, therefore, inappropriate to use the name lower quartile to vefer to the 
lowest one-fourth of a given collection of scores. 


4.5 COMPUTATION OF PERCENTILE RANKS CORRESPONDING TO TEST SCORES 


In this section we shall consider the computational problem of deter- 
mining the percentile rank of each score point (i.e., unit point) on the scale 
of a distribution of test scores. To this end, consider the collection of 50 
scores shown in Table 4.2. These numbers may be regarded as the measure- 
ments derived from the application of some seule to 50 objects. To make 
the example as concrete as possible, we shall assume that these scores repre- 
sent the number of words correctly spelled by 
thirty-word spelling test. 


50 sixth-grade pupils on a 


TABLE 4,2 Scores of 50 Sicth-Grade Pupils on а Thirty-Word 
Spelling Test 


19 20 24 21 20 21 20 22 10 
23 19 17 20 19 19 21 21 21 
11 20 18 18 27 19 20 23 25 
18 19 20 19 22 18 23 20 11 
13 18 20 16 20 25 22 19 20 


| Since we wish to determine the percentile rank of each unit or score 
point, we shall begin the computations by setting up a unit-interval fre- 
quency distribution. This distribution is shown in Table 4.3. Though it 
Is not necessary to do so, the real limits of each unit interval and the tally 
à cores have been included for sake of com- 
Next we shall obtain the cumulative frequeney associated with 


marks made in classifying the s 
pleten 


72 


PERCENTILE RANKS AND PERCENTILES 


TABLE 4.3 Distribution of 5U Spelling Test Scores 


ReaL Limits 
or Unit 
INTERVALS 


Score 
Points 


17.5-18.5 
16.5-17.5 
15.5-16.5 
14.5-15.5 
13.5-14.5 
12.5-13.5 
11.5-12.5 
10.5-11.5 

9.5-10.5 


sI - 
Slowo-oo--muwoti--w-uwoc- 
سر‎ бо Q2 > > = сл 


The cumulative frequency ofa given interval is the frequency 
frequencies of all intervals below this 
e shown in Table 4.3 in the 


each interval. 
of this interval plus the total of the 
given interval. These cumulative frequencie 
column headed cf. The cf-value of any interval states the number of scores 
in the distribution that fall below the upper real limit of that interval. 
total number of scores in the distri- 
the percentile ranks of the upper 
Since our interest is in the score or 


Hence, expressed as percentages of the 
bution, these relative c/-values become 
als. 

s) we shall not express these cf-values 
aid to estimating the eumu- 


real limits of their respective interv 
Unit points (i.e., the interval midpoint 
as percentages, but shall use them instead as an i | 
associated with the score points. To this end we 


lative frequency values л > 
2 of treating the measures 


shall apply the convention diseussed in Section 4. 
falling in an interval as being evenly distributed throughout that interval. 
Hence, the cumulative frequency associated with the midpoint of апу 
interval may be estimated by adding one-half the f-value for the interval 
to the cf-value of the next lower interval. ‘These values have been deter- 
mined for each interval in Table 43. and are listed in the column headed 
cfm (cumulative frequency of midpoint). Now, to find the percentile ranks 


Lins necessary only to express these cf/n-values as 


Of the score points it rem: 


73 


PERCENTILE RANKS AND PERCENTILES 


percentages of the total number of scores in the collection (in our example 
as percentages of 50). The resulting P-values are shown in Table 4.3. 


4.6 COMPUTATION or PERCENTILE RANKS FROM GROUPED DATA 


Occasionally it may be necessary to estimate the percentile rank of a 
score point when the data are available only in the form of a groupe 
frequency distribution. To illustrate the procedure to be followed we shal 
make use of the grouped frequency distribution shown in Table 44. For 
the sake of computational convenience, the cf-values for the intervals of 
this distribution are also reported in Table 4.4. 


CLASSES Ј cf 

80-84 1 50 

TABLE 4.4 75-79 3 49 
70-74 2 46 

65-69 4 44 

4 Grouped Frequency 60-64 0 40 
Distribution of 50 Test 85-50 3 40 
Scores 50—54 7 37 
45-49 10 30 

40-44 6 20 

35-39 4 14 

30-34 0 10 

25-29 0 10 

20-24 4 10 

15-19 2 6 

10-14 3 4 

5-9 1 1 


Example 4.1. Estimate the percentile rank of the score point 48 on 
the score scale of the dist ribution of Table 4.4. 

Solution. To solve this problem it is necess 
of scores in the distribution which lie 
and then to express this estimated number as a percentage of the total 
number of scores in the distribution (i.e., as a percentage of 50). 
› First we note that the point 48 falls in the interval which has the real 
limits 44.5-49.5 (see Figure 4.1). Referring to the cf-values of Table 4.4. 
we see that below the lower real limit of this interval (ie. below 44.5) there 
are 20 scores. Referring to the f-values, we note that 10 additional scores 
fall in the interval to which 48 belongs (i.e., the interval 44.5-49.5). To 
solve this problem we need to determine how many of {һезе 10 — lie 
between the lower real limit of this interval and the score point 48. Un- 


ary to estimate the number 
below the point 48 on the score seale 


PERCENTILE RANKS AND PERCENTILES 


fortunately, since the data are grouped, we have no information regarding 
the way in which these 10 scores are distributed among the 5 unit intervals 
comprising this larger interval (i.e., the unit intervals having the midpoints 
45, 46, 47, 48, and 49). Hence, we have to arrive at some estimate of the 
number of scores falling between the lower real limit of this interval and the 


3.5 -——L 
E: L- = шр = 1 SS 
45 48 


5 units 
44.5 (10 scores in this interval) 49.5 


20 scores 
below 44.5 
Note: 3.5/5 = 7 of the interval 44.5-49.5 


Figure 4.1 А portion of the score scale of the distribution of Table 4.4 


point 48. Granting that the procedure may provide only a rather crude 
approximation, we shall extend the convention developed in Section +.2 
for the case of unit intervals to situations in which the intervals span more 
than one unit. Specifically, we shall assume that the 10 scores falling in 
the interval 44.5-49.5 are evenly spread throughout this larger interval. 
If this be true, the number of scores between 44.5 and 48 should be the same 
fractional part of the interval frequency (i.¢., of 10) that the distance from 
44.5 to 48 (i.e., 3.5) is of the size of the interval (i.e., 5). Now, 3.5 is .7 of 
5, and .7 of 10 is 7. Hence, the convention or assumption leads to an esti- 
mate of 7 scores falling between 44.5 and 48, or a total of (7 + 20) scores 
falling below the score point 48. Since 27 is 54 per cent of the total number 
of scores (i.e, of 50), it follows that the estimated PR of 48 is 54. 


Example 4.2. Estimate the PR of 21. 

Solution. The point 21 is 1.5 score unit n lowe i | 
the interval in whieh it falls (i6, 21 — 19.5 — 1.5). This distance is 3 ol 
the size of the interval (i.e., 1.5 + 52.3. : suming the 4 scores falling in 
this interval to be evenly spread throughout this interval, it follows that 
1.2 scores (i.e., .3 of 4 = 1.2) lie between the lower real limit of this interval 
and the point 21. The cf up to this interval is 6. Hence, the estimated 
number of scores falling below 21 is 7.2 (ie, 1.2 + 6). Therefore, the 
estimated PR of 21 is 14.4 (i.6.. 1.2 expressed аза percentage of 50). Since 
the procedure employed in estimating the number of scores between the 
Score point involved and the lower real limit of the interval in which it falls 
is based on an assumption that may not be too well satistied, there is little 
point in retaining decimal fractions in reporting percentile ranks, To do 
ко is to pretend a degree of accuracy which cannot be defended. Hence, it 


5 above the lower real limit of 


PERCENTILES 7 5 


PERCENTILE RANKS AND 


is recommended that percentile ranks computed by the procedure illus- 
trated be rounded to the nearest whole-number percentage. In our example, 
then, the estimated PR of 21 should be reported as 14. 


Example 4.3. Estimate the PR of 62. 

Solution. There are no scores located in the interval in which the score 
point 62 falls. Hence, there can be no scores between the lower real limit 
of this interval and the point 62, and the total number of scores falling 
below 62 is, therefore, the cf-value for this interval (i.e., 40). Hence, the 
PR of 60 is 80 (ie, 40 expressed as a percentage of 50). It ix important 
to note that 40 scores fall below any score point within this empty interval 
(59.5-64.5). Hence, the PR of any point between 59.5 and 64.5 is the same 
(i.e., 80). 


47 COMPUTATION or PERCENTILES 


We shall now consider the problem of estimating the value of P,. This 
simply means that we must determine as closely as we ean the location of 
the point on the scale below which x per cent of the scores lie. As in the 
case of estimating percentile ranks, we shall again employ the convention 
developed in Section 4.2. That is, we shall treat the scores classified in an 
interval as being evenly spread throughout this interval. 


Example 4.4 Estimate the median (i.e, P50) of the distribution of 
Table 4.3. 

Solution. In this example we seek the point on the score scale below 
which one-half of the scores in this distribution fall. Since there are 50 
scores in the collection, this means that we seek the score point below which 
25 of the scores lie. We note from the c[- values of Table 4.3 that 20 scores 
fall below the score point 19.5 and that 32 
Hence, the required point, that is, the 


he somewhere between 19.5 and 20 


scores fall below the point 20.5. 
point below which 25 scores fall, must 
9 (see Figure 4.2). We cannot, of 


19.9 =P, 


19 21 
20 scores below 


32 scores below 
this point 


this point 


12 scores in 
this interval 


Note: 5/12 of 1—.4 


FIGURE 4.2 A portion of the score scale of Table 4.3 


76 


PERCENTILE RANKS AND PERCENTILES 


course, determine the exact location of this point inasmuch as we do not 
know how the 12 scores falling between 19.5 and 20.5 are actually dis- 
tributed within this interval. However, if we regard these 12 scores as 
being evenly spread between 19.5 and 20.5, then the lower 5 of them (i.e., 
25 minus 20, or the number of scores below the point we seck to determine 
minus the number of scores below the lower real limit of the interval in 
which this point is known to fall) will fall in the lower five-twelfths or 42 
of this interval. Hence, the estimated location of the required point is 
19.5 + .42 or 19.92. 

Since the procedure just described is based on an assumption which 
may not be too well satisfied, there is no point in pretending a greater 
degree of accuracy than one can reasonably hope to attain. It is, of course, 
possible with a unit-interval frequency distribution to indicate definitely 
the location of P, to the nearest unit point. Thus, in the above example, 
we know definitely that Pso lies between 19.5 and 20.5 and hence is nearer 
to 20 than to either 19 or 21. Even though our convention may not be in 
close conformity with the actual situation, it must, if it is to be useful at 
) an estimate of Р, with a somewhat finer degree of 


all, enable us to achieve 
That is, we should 


accuracy than is provided by the score scale involved. 
be able to locate P, more precisely than to the nearest unit point. It is 
impossible to say just what degree of precision is achieved by the method 
entile points which we have employed here. However, 
since the next most convenient finer division of the measuring scale would 
be tenths of a unit, we shall recommend the retention of the tenths place in 
reporting percentiles. This may well represent a finer division than is 
justifiable, but it is a convenient division to employ in view of the basic 
character of our numbering system. It is, then, primarily on the grounds of 
convenience that our recommendation must be justified. If this suggestion 
is applied to the result of the foregoing example, the value of Ро would 


be reported as 19.9. 


of estimating реге 


Example 4.5 Estimate the value of the third quartile (P75) in the 


distribution of Table 4.3. Я POM 
Solution. The third quartile is the point below which 75 per cent or 


37.5 scores fall. From the cf-value we note that 32 scores fall below 20.5, 
While 39 scores fall below 21.5. Hence, the point below which 37.5 seores 
fall is somewhere in the interval 20.5 to 21.5. If the 7 scores falling in this 
interval are regarded as evenly spread throughout it, then the lower 5.5 
of these 7 scores (37.5 — 32 = ¿ 5) must fall in the lower 5.5-sevenths or .8 
of this interval. It follows then 
(ie., 20.5 + .8). 


that the estimated value of P75 is 21.3 


Example 4.6. Estimate the value of Рә» in the distribution of Table 


4.4. 


77 


PERCENTILE RANKS AND PERCENTILES 


Solution. The procedure followed when the data are organized into a 
grouped frequency distribution is basically the same as that which we have 
employed in the foregoing examples. We seck the point below which 25 per 
cent or 12.5 scores (i.c., 25 per cent of 50) fall. The cf-values of Table 4.4 
show that 10 scores fall below 34.5 and 14 scores fall below 39.5. Hence, 
the point below which 12.5 scores fall lies somewhere in the interval 34.5 
to 39.5 (see Figure 4.3). Assuming the 4 scores falling in this interval to be 


37.6 


—— 3.] units 


к——: u EE E Af 
35 39 


5 units 


14 scores 
34. in this iı P 
below 34.5 5 (4 scores in this interval) 39.5 Below: 39,5 


Note: 2.5/4 of 5=3.1 


10 scores 


Ficurr 4.3 A portion of the score scale of the distribution of Table 4.4 


evenly spread throughout it, the lower 2.5 (12.5 — 10 = 2.5) of these 4 
scores must fall in the lower 2.5-fourths of this interval. That is, the lower 
2.5 of these 4 scores must fall in that segment of the interval which extends 
3.1 units above 34.5, since 2.5-fourths of 5 (the size of the interval) is 3.1. 
Therefore, the estimated value of Paş is 37.6 (i.e., 34.5 + 3.1). 


Example 4.7. Estimate the value of P55 in the distribution of Table 4.4. 

Solution. P55 is the point below which 55 per cent or 27.5 scores fall. 
From the cf-values of Table 4.4 we see that 20 scores fall below the point 
44.5 and 30 scores below 49.5. Hence, the required score point lies some- 
where in the interval 44.5 to 49.5. Assuming the 10 scores falling in this 
interval to be evenly spread throughout it, the lower 7.5 of these 10 scores 
must lie in the lower 7.5-tenths of this interval. 
size of the interval) is 3.8, it follows that these 7 
segment of this interval which extends 3. 
the estimated value of P 


Since 7.5-tenths of 5 (the 
.5 scores must fall in that 


‹ 8 units upward from 44.5. Hence, 
55 is 48.3 (i.e., 44.5 + 3.8 = 48.3). 


Example 4.8. Estimate the value of Dg in the distribution of Table 4.4. 

Solution. D4 is the score point below which 40 per cent, or 20, of the 
scores fall. From the cf-values of Table 4.4 we see that 20 scores fall below 
the point 44.5. Hence, the estimated value of Dg is 44.5. 

It should be observed th. 


partieular percentile point a 
the estimate of the percenti 


at when the number of scores associated with a 
ppears as a unique value in the cf-column, then 
le point is the upper real limit of this interval. 


78 


PERCENTILE RANKS AND PERCENTILES 


4.8 INDETERMINATE PERCENTILES 


When a frequency distribution involves empty intervals, certain 
percentiles are indeterminate in the sense that an unlimited number of 
points exist which satisfy their definitions. 


Example 4.9. Estimate Ds in the distribution of Table 44. 

Solution. Here we seek the score point below which 80 per cent, or 40, 
of the scores in the distribution fall. From the cf-values of Table 4.4 we 
see that 40 scores fall below the point 5t However, since the interval 
5 to 64.5 is empty (i.e., has a frequency value of zero), it follows that 40 
scores also fall below the point 64.5. In fact, 40 scores fall below any of the 
infinity of points bet ween 59.5 and 64.5, and hence all of these points satisfy 


the definition of Ds. 

It is customary, nevertheless, in situations of this type to report a 
single value for the required percentile. The value reported is that which 
lies midway between the two extreme possible values. In Example 4.9 the 
extreme possible values are 59.5 and 64.5. The point midway between these 
values is 62. Hence, Ds is reported as having the value 62. It should be 
observed that the selection of 62 from among the infinity of possible points 


satisfying the definition of Ds is purely arbitrary. 


Example 4.10. Estimate Ps in the dist ribution of Table 4.3. 

Solution. Here we seek the score point below which 8 per cent, or 4 of 
the scores in this distribution lie. From the cf-values of Table 4.3 we see 
that 4 scores fall below the point 13.5. However, since the next two higher 
intervals both have zero frequencies, it follows that 4 scores also fall below 
the point 15.5. Hence, all points between 13.5 and 15.5 satisfy the definition 
tween these possible extremes 15 14.5. Hence, 


of Ps. The value midway be иез: 
gested in Example 4.9, we 


in accordance with the arbitrary practice sug 
shall report 14.5 as the value of Ps. 


4.9 ‘Tum Usg or THE OGIVE IN ESTIMATING PERCENTILE RANKS 
AND PERCENTILES 
er a scheme for representing a cumulative 
frequency distribution graphically. We shall also show how such a graph 
may be employed in estimating percentiles and percentile ranks. The graph 
of a cumulative frequency distribution is known as an 00200. ; 

g an ogive, lay out a set of axes similar to 


As a first step in constructin, A 
those used in preparing a polygon. Along one of these axes mark off a scale 
of values corresponding to the yariable (scores) involved. Along the other 


mark off a scale extending from zero to the largest cf-value (i.e., to the N 
of the distribution). It is customary, but not essential, to place the score 


In this section we shall consid 


79 


PERCENTILE RANKS AND PERCENTILES 


scale along the horizontal axis and the cf-scale along the vertical axis. Then 
locate points at heights representing the cf-values and above the eor- 
responding points on the score scale. We have previously observed that 
these cf-values represent eumulatious of frequencies up to the upper real 
limits of each interval along the score seale. Hence, these points must be 
located above the upper real limits of the intervals rather than abore the mid- 
points as was the case in preparing a polygon. As a final step, connect these 
points by straight lines, bringing the picture to the horizontal (ie, the 
score) axis at the lower real limit of the bottom interval, at which point the 
cf-value is zero. The marking off of the seales and the plotting of points 
will be greatly facilitated if squared paper is employed. The ogive of the 
distribution of Table 4.3 ix shown in Figure 4.4. 


10 


5 10 15 20 25 30 
X-Scale (i.e., Test Score Scale) 


Figure 4.4 Ogive of 50 spelling-test scores of the 
distribution of Table 4.3 


If, instead of plotting points 


‹ at heights representing the cf-values, we 
had plotted points at heights re 


t presenting relative eumulative frequency 
values (i.e., ef-values expressed as proportions or preferably as percentages 
of N) it would have been possible to use the resulting ogive to estimate 
percentile ranks or percentiles. "Table 4.5 contains selected columns from 
Table 4.3, together with a new column giving the relative cumulative fre- 
quencies (ref) of the upper real limits of each interval, These ref-values in 


8o 


PERCENTILE RANKS AND PERCENTILES 


Table 4.5 have been expressed as percentages. Figure 4.5 shows the ogive 
for these relative cumulative frequencies. The only difference between the 
ogives of Figures 4.4 and 4.5 is that in the latter the cumulative frequency 
scale is marked off in terms of percentages (i.e., from 0 to 100) instead of 
actual cumulative frequency counts. 


TABLE 4,5 Distribution of 50 Spelling Test Scores 
Rea LIMITS 73 
or UNIT y of ref 
INTERVALS 

1 50 100 

0 49 98 

2 49 98 

1 47 94 

3 46 92 

4 43 86 

7 39 78 

12 32 64 

9 20 40 

5 11 22 

1 6 12 

1 5 10 

0 4 8 

0 4 Б] 

1 4 hi 

0 3 6 

2 3 6 

1 1 2 


To show how an ogive may be employed to estimate percentile ranks, 
We shall use Figure 4.5 to determine the percentile rank of the seore point 
18. First locate the point on the ogive above the score point 18 (see line f 
in Figure 4.5). Then locate the point on the ref-seale that corresponds to 
this point on the ogive (see line B in Figure 4.5). The value of this point on 
the ref-seale (i.e, 17) is the estimated percentile rank of 18. Reference to 
Table 4.3 shows this result to be in accord with that previously obtained. 

To use the ogive of Figure 4.5 to estimate the median of the distribution 
involved, first locate the point on the о; 
ref-seale (see line C in Figure 4.5). Then locate the point on the seore seule 
that lies directly below this point on the ogive (sce line D in Figure 4.5). 
The value of this point on the score scale (i.e.. 19.9) is the estimated median 
of the distribution. Reference to the solution of Example 4.4 shows the 
result read from the ogive to be in agreement with that previously obtained. 


give opposite the point 50 on the 


ILES 8 1 


PERCENTILE RANKS AND PERCENT 


A P ERRES 
| ! 
80 LT] | | 1 


+ 
40 Li Li 


rcf-Scale 
S 
Ї 
H- 
| LI 
لإ‎ 
D = 
Г 
| E 
| 
| 
H- | | 


20 

® Ee 
j ] | 
к | ка ШШ 
| | үү. 1L. 

0 

5 10 15  ($93)20 
X-Scale 


Figure 4.5 Ogive of 50 spelling-test scores 


If an ogive is constructed with care and to a sufficiently large seale, it 
is possible to use it in estimating percentile ranks and percentiles with as 
much accuracy as can be justified (i.c., percentile ranks to the nearest whole- 
number percentage and percentiles to the nearest tenth of a score unit). 


100 


rcf-Scale 


3007940 (950 ^ qp 70 


80 90 
X-Scale 


FIGURE 4.6 Ogive of the distribution of Table 4.4 


82 


PERCENTILE RANKS AND PERCENTILES 


If a number of percentile ranks and percentiles are to be determined for a 
given set of data, the use of an ogive for this purpose is both simple and 
efficient. 

igure 4.6 is the ogive of the relative cumulative frequencies of the 
distribution of Table 4.4. Note that the points are plotted above the upper 
real limits of each interval. This ogive has been used to estimate (1) the 
percentile rank of the score point 48 (see lines A and B in Figure 4.6) and 
(2) the value of 725 (see lines С and D in Figure 4.6). The results which 
are shown in Figure 4.6 are in agreement with those previously obtained in 
the solutions of Examples 4.1 and 4.6 respectively. 


4.10 POPULATION PERCENTILE RANKS AND PERCENTILES 


In Section 2.4 we gave some consideration to the problem of making 
an inference regarding the form of a population distribution from an inspec- 
tion of the frequency distribution of a sample of scores taken from that 
population. In this section we shall consider a crude but nevertheless useful 
technique for estimating percentile ranks and percentiles of a population 
distribution from the distribution of scores for a sample. 

It was observed in Section 2.4 that when the data involved are measure- 
ments of a continuous attribute, and when the population itself is extremely 
large and composed of individuals representing all shades of variations in 

attribute they possess, then the population polygon 


the amount of the ¢ "P if 
Would approach a smooth curve. If this is the case, it follows that the popu- 


lation ogive would also approach a smooth curve. Hence, just as it is 
Possible to obtain a more highly generalized picture of the population 
distribution by “smoothing” the sample polygon, so is it possible to obtain 
a more highly generalized picture of the relative cumulative frequency 
distribution of the population by “smoothing” the sample ogive. As was 


suggested in the case of the polygon, one simple means of accomplishing 
| a smooth curved line which comes as close as is 


this is to draw "free-hand" ! А d е 
reasonably possible to passing through all of the points used in plotting the 
Sample ogive. E 

Consider. as an illustrative application, the problem of establishing 
ad. Norms are intended to be descriptive of the performance 


percentile norms. Р 
als on a particular test. In 


Of a specified group or population of individu: te 
other words. norms are statements of a quantitative character descriptive 
9f a population frequency distribution of test scores. There are many ways 
in which these quantitative statements can be expressed. Percentiles and 
Percentile ranks for such a population distribution of test scores are known 
aS percentile norms. Such norms make possible the interpretation or evalua- 
tion of the single score made by a given individual member of the popula- 
tion in relation to, or in comparison with, the scores made by the other 


Members of the population. 


83 


PERCENTILE RANKS AND PERCENTILES 


Since percentile norms are population values, and since it is ordinarily 
impossible to administer a test to all members of a population, it follows 
that the percentile norms reported for a test can usually be nothing more 
than estimates based on a distribution of scores obtained for a sample of 
individuals presumed to be representative of the particular population in 
question. Because the individuals comprising a population differ, and be- 
cause chance, or uncontrolled, influences always play some part in determin- 
ing which of these differing individuals are to constitute the sample, it 
follows that the sample distribution may be expected to differ to some 


TABLE 4 6 Distribution and Percentile Ranks of Scores on 
. Vocabulary Test for 2,000 Towa Eleventh-Grade 
Pupils 
mg | oBE Po 
TC cfm ^ SAMPLE 
d B f i PAYEE Коохокр | Oatve 
29 12 2000 100.0 99.7 100 100 
28 33 1988 99.4 98.6 99 99 
27 4 1955 97.8 97.7 98 98 
26 31 1951 97.6 96.8 97 97 
25 T 1920 96.0 95.8 96 96 
24 55 1913 95.7 94.3 94 94 
23 87 1858 92.9 90.7 91 92 
22 30 1771 88.6 1756.0 87.8 88 89 
21 59 1741 87.1 1711.5 85.6 86 85 
20 150 1682 84.1 1607.0 80.4 80 81 
19 51 1532 76.6 1506.5 75.3 75 76 
18 130 1481 74.1 1416.0 70.8 71 70 
17 129 1351 67.6 1286.5 64.3 64 64 
16 180 1222 61.1 1132.0 56.6 57 57 
15 138 1042 52.1 973.0 48.7 49 49 
14 210 904 45.2 799.0 40.0 40 40 
13 155 694 34.7 616.5 30.8 31 32 
12 35 539 27.0 521.5 26.1 26 25 
11 103 504 25.2 452.5 22.6 23 20 
10 125 401 20.1 338.5 16.9 17 15 
9 90 276 13.8 231.0 11.6 12 10 
8 90 186 9.3 141.0 74 7 7 
7 10 96 4.8 91.0 4.6 5 5 
6 30 86 43 71.0 3.6 4 4 
5 4 56 2.8 54.0 27 3 3 
4 28 52 2.6 38.0 1.9 2 2 
3 9 24 1.2 19.5 1.0 1 1 
2 E 15 0.8 7.5 0.4 0 0 
2000 
DIE ccce peu e ee || eee 


PERCENTILE RANKS AND PERCENTILES 


extent from that of the population. If this is the case. the sample cumula- 
tive frequency distribution from which the estimates of the percentile norms 
are derived will also differ to some extent from the population cumulative 
frequency distribution. As has been suggested, one possible means of 
minimizing such differences consists of "smoothing" the sample ogive. The 
estimated percentile norms may then be read from this smoothed ogive. 

As an example, suppose it is required to extimate percentile norms for 
lowa eleventh-grade pupils on a given vocabulary test. Let 2,000 pupils 
enrolled in the eleventh grade in Iowa high schools be selected as a sample 
to represent this population. The frequency distribution of the scores of 
these 2,000 pupils on the given test is shown in Table 4.6. This table also 
gives the percentile ranks for this particular sample, determined by the 
method described in Section 4.5, and the estimated percentile norms read 
from the smoothed ogive shown in Figure 4.7. While the latter are not 
markedly different from the former, they are probably somewhat superior 
as estimates of the corresponding population values. 


100 
90 


rcf-Scale 
hau oa 
о © 


o 


10 1 5 20 25 30 
X-Scale 


Figur: 4.7 Smoothed ogive based on the relative cumulative frequencies of 


а sample of vocabulary-test scores made by 2,000 Towa eleventh-grade pupils 


4.11 OVERLAPPING DISTRIBUTIONS 


The study of differences among individuals has long been, and con- 
tinues to bé, a matter of considerable importance to psychologists and 
educators. That individuals differ is a fact observable to all. The extent 


Of such differences, however, is often not fully appreciated. On first thought, 


85 


PERCENTILE RANKS AND PERCENTILES 


$ istributi scores Made by Groups of Third-, 
TABLE Distributions of Scores ! ‘ 
и 4.7 Fourth-, and Fifth-Grade Pupils ou a Vocabulary 


Test 
= 
| x if rif, Å ref, k rely 
40 5 100.0 
39 2 100.0 14 99.0 
38 4 99.6 14 06.2 
37 5 98.7 16 93.5 
36 2 100.0 12 97.7 20 90.3 
35 2 99.5 11 95.1 25 86.3 
34 1 99.0 10 92.8 23 
33 1 98.7 10 90.7 32 
32 0 98.4 17 88.6 28 
31 5 98.4 13 $5.0 34 
30 6 97.1 12 82.3 30 
29 8 95.5 16 79.7 28 
28 6 93.5 19 76.4 20 
27 7 91.9 25 124 18 
26 8 90.1 20 7.1 31 
25 6 88.0 31 62.9 18 
24 8 86.4 20 58.4 17 
23 9 84.3 18 54.2 18 
22 12 81.9 24 50.4 З 
21 10 78.8 14 45.4 18 
20 15 76.2 20 42.4 7 
19 8 72.3 14 38.2 11 
18 13 70.2 21 35.2 9 
17 10 66.8 17 30.8 10 
16 20 64.1 13 8 
15 16 58.9 14 n 
14 19 54.7 11 11 
13 23 40.7 12 12 
12 21 43.7 13 4 
11 24 38.2 16 2 
10 30 31.9 11 4 
9 23 24.1 12 2 
8 22 18.1 11 3 
7 16 12.3 10 1 
6 11 8.1 0 0 
5 8 5.2 2 1 
4 7 3.1 0 0 
3 0 13 2 0 
2 4 13 0 f i 


PERCENTILE RANKS AND PERCENTILES 


it would appear that a satisfactory method of studying, say, differences in 
vocabulary among fourth-grade pupils, would require little more than an 
inspection of the polygon of the frequency distribution of scores for a group 
of fourth-grade pupils on a vocabulary test. Unfortunately, however, such 
a voeabulary-test scale, unlike a scale of, say, heights, is not a fundamental 
measuring seale, and hence differences along it are not meaningful in the 
same sense as differences along a fundamental seale. This being the case, 
some other method is needed of describing or evaluating individual dilfer- 


ences with respect toa trait measured by a rank-order scale. 

A device sometimes employed to this end in the school-grade situation 
involves the determination of the extent to which overlapping occurs in a 
given ability for different grade levels. Thus, if among fourth-grade pupils 
we find some whose vocabularies are on a par with those of typical fifth-, 
sixth-, seventh-, or even eighth-grade pupils, and others whose vocabularies 
are only on a par with those of typical pupils in lower grades, we have a 
better picture of the extent to which individual differences in vocabulary 
exist. If we can take the additional step of indicating the various per- 
centages of, say, fourth-grade pupils whose performances surpass the fifth- 
or sixth-grade medians on a vocabulary test, or fall below the third- or 
second-grade medians, we shall have made our deseription of individual 
differences even more concrete. This can best be accomplished by placing 
the ogives for the score distributions of the various grades on the same axes 
iding the required percentages from the resulting diagram, Just how 
this may be done ean best be shown by presenting a specific example. 

Table 4.7 gives the frequency distributions and the relative cumulative 
for the scores on a vocabulary test made by samples 
474 fourth-grade pupils, and 505 fifth-grade 
pupils. Figure 4.8 shows the smoothed ogives of these relative cumulative 
frequency distributions plotted on the same axes. The ogives were smoothed 
| as samples to represent their respective 


and re: 


frequency distributions 
of 382 third-grade pupils, 


because these groups were selected | 
grade populations. Hence. itemients about the percentages by which the 
fourth-grade distribution overlaps the third- and fifth-grade distributions 
May be generalized to the fourth-grade population, (hat is, may be viewed 
as population estimates. Й 

We shall now show how such estimates may be made from Figure 4.8. 
the problem of estimating the percentage of fourth- 
ibulary test scores are lower than the third-grade 
ate may be made in two steps as follows: (1) 
e median for the third grade; 


Consider, for example, 
grade pupils whose voc: 
Median. The required estim 
Using the Grade 3 ogive, estimate Po OF th д ird 
then (2), using the Grade 4 ogive. estimate the percentile rank of this score 
Point in the fourth-grade group. Inspection of Figure 4.8 shows that for 
Grade 3, P59 = 13.7 (see lines A and B in this figure) and that the percentile 
rank of 13.7 in the fourth-grade group 15 19.5 or 19 (sce lines B and C in 
this figure). Hence, 19 is the estimated percentage of fourth-grade pupils 


RCENTILES 8 7 


PERCENTILE RANKS AND PE 


1004 


E 
م‎ i 


PE AF 
0 10 GD 15 20 
X-Scale 


25 30 35 40 


FIGURE 4.8 Ogives of distributio 
of third-, fourth-, and fifth 


ns of vocabulary test scores for groups 
-grade pupils 


whose vocabulary scores on this test fall below the median score for Grade 
5. Tf it can be assumed that this tes 


vocabulary power, then it ¢ 
of the individuals comprisin 


t provides an accurate measure of 
an be inferred that approximately 19 per cent 
g the fourth-grade population are at a stage ol 
vocabulary development Which is below that of the typical third-grade 
pupil. Moreover, approximately 8 per cent of the 
to a degree placing them in the lower one- 
lation (see lines D, E, and F in Figure 4.8), 

Tt should be observed that it is possible 
Grade 4 pupils below any Gr. 


ade 3 percentile 
the value of the Grade 3 Percentile point. 


fourth grade are retarded 
fourth of the third-grade popu- 


to estimate the percentage of 
point without first estimating 
Thus, the estimated 19 per cent 


88 


PERCENTILE RANKS А ND PERCENTILES 


of the fourth-grade population which lies below 5o for Grade 3 could have 
been read from Figure 4.8 without noting that for Grade 3, P50 = 13.7. To 
accomplish this, first locate the point on the Grade 3 ogive opposite 50 on 
the ref-scale (see line A in Figure 4.8). Then, locate the point on the 
Grade 4 ogive which is directly below this point (see line B). Finally, locate 
the point on the ref-scale directly opposite this point on the Grade 4 ogive 
(see line C). This point indicates the required percentage (i.e., 19.3 or 19). 

We shall next direct our attentian to the individuals in the Grade 4 
population whose vocabularies are developed to an advanced level. To this 
end we shall estimate the percentage of the Grade 4 population that lies 
above the median of the fifth-grade group. This may be done by first 
using the procedure described above to estimate the pereentage of the Grade 
3 population whieh ix below the Grade 5 median. This percentage is approxi- 
mately 78 (see lines 1, G, and H in Figure 4.8). Hence, the estimated 
percentage of the Grade 4 population which is above Pso for Grade 5 is 22 
Ge., 100 — 78 = 22). Similarly, it сап be shown that approximately 10 
per cent of the Grade 4 population lies above the value of Pz5 for the Grade 
5 population (see lines Z, J, and K, in Figure 1.8). 


Estimated Percentages of Fourth-Grade Population 


TABLE 4.8 


BELOW Pos BELOW Го ABOVE Po ABOVE Р 
dion 25 эы a Ж 3 xm 
FoR GRADE 3 FOR GRADE 3 FOR GRADE 5 FOR GRADE 5 
S 19 22 10 


The results of our findings, which have been summarized in Table 4.8, 
iningful description of the extent of individual 
s they exist in a Grade 4 population. The 
resented in greater detail by estimating 
and by extending the 


Provide a striking and me: 
differences in vocabulary 
description could, of course, be pr Д 
Such percentages for additional percentile points 
Study of overlap to still lower and higher grade levels. 


4.12 Distances BETWEEN SPECIAL PERCENTILE POINTS 
Figure 4.9 shows the smoothed polygon of an idealized population of 
pap i The distribution is unimodal and 


Measures of a continuous attribute. ied | 
Syninetrieal, The nine decile points have been marked on the score scale, 


"speetion of this figure shows clearly that the distances between these 
ecile points is Î uniform The actual differences between these decile 
Sls : 


Points are reported in Table 4.9. 


| It will be recalled that the dec 
Which divide the distribution into ten equa 


iles are the nine points on the score seale 
l-sized subgroups. Therefore, it 


PERG — "- 
ERCENTILE RANKS AND PERCENTILES 89 


10 


Dı Di Ds: Di Ds De Di Ds Ds 
Score Scale 


Ficure 4.9 


Smoothed polygon of idealized population (bell-shaped) of measures 
of a continu 


ous attribute showing locations of nine decile points 


follows that the 
scale where 
scale where 


; à FON e 
distances between deciles will be largest at those portions of нн 
the frequencies are smallest, and smallest at those portions of vi 
frequencies are largest. This principle is also applicable to quartile 


DIFFERENCES 


TABLE 4.9 


Decile Points and Inter- 
decile Differences (Bell- 
Shaped Distribution) 


90 


PERCENTILE RANKS AND PERCENTILES 


DECILES Points DIFFERENCES 
Do 45.6 
ABLE 4.2 
4.10 " dx 
2.5 
Decile Points and Inter- D; 38.9 
decile Differences (Skewed 17 
Distribution) De 37.2 
1.2 
Ds 36.0 
0.9 
Ds 35.1 
0.7 
Ds 344 
0.8 
р» 33.6 
1.0 
Di 32.6 


Figure 4.10 with Table 4.10 and Figure 4.11 with Table 


and centile points. 
iple in the case of different types of idealized popu- 


4.11 illustrate this princ 
lation distributions. 


Ds Ds 
Score Scale 


DiD:DiD4Ds Ds Di 


idealized population (positively skewed) of 


Fieuss d polygon of : 2 
URE 4.10 Smoothed potyg nowing locations of nine decile points 


measures of a continuous attribute 8 


91 


PERCENTILE RANKS AND PERCENTILES 


Score Scale 


Figure 4.11 Smoothed polygon of idealized population (U-shaped) of 
measures of a continuous attribute showing locations of nine decile points 


Consideration of this principle should serve to dispel the misconception, 
not infrequently held by beginning students of statistical methods, that the 
average of, say, Ds and Do (i.e., the point midway between Ds and Dy) is 


TABLE 4.11 


Decile Points and Inter- 
decile Differences (Bi- 
modal Distribution) 


92 


Deciues Ponts DIFFERENCES 
Ds 54.4 
11 
Рв 53.3 
1.2 
D; 52.1 
1.4 
Ds 50.7 
8.7 
Ds 42.0 
8.7 
D4 33.3 
1.4 
Dz 31.9 
1.2 
Dz 30.7 
1.1 
Di 29.6 | 
PERCENTIL 


E RANKS AND PERCENTILES 


Dr, a condition which can generally obtain only if the interdecile distances 
аге uniform (for interdecile distances to be uniform requires a rectangular 
distribution). In the case of Figure 4.10, for example, inspection shows D; 
to be much nearer D; than Dy. Actually, the point midway between Ds 
and Dy is 40.8 (е. 1 2 [36.0 + 45.6] = 40.8) which point lies well above 
Dz (20; = 38.9). Or in the case of Figure 4.11, D; is much nearer Do than Ds. 
In this situation the point midway between Ds and Dy is 48.2, which lies 
well below Dz (D; HON 

It should be obvious, then, that deciles, or for that matter quartiles or 
centiles, cannot be regarded as units in the usual sense, for the distances 
between them fluctuate, and further they do not represent distances from 
a definite zero point. Thus, even if the attribute involved is measured in 
terms of a fundamental scale, it cannot be said that an individual at Do is 
as much above an individual at Ds as, say, an individual at Ds is above an 
individual at Ds. Similarly, it cannot be said that the score of an individual 
at Dy is twice that of an individual at De. Actually D4 may be only a few 
score points above Dz. In Figure 4.9, for example, D4 is 23.7 and D» is 20.7. 


We have seen that distances between special percentile points vary 
inversely with the magnitude of the frequencies at that portion of the score 
scale. This fact makes it possible to gain some notion of the general form 
of a distribution from a table of differences between special percentile 
Points. Consideration of the interdecile differences shown in Tables 4.9 
and 4.10 without reference to Figures 4.9 and 4.10 shows that the distribu- 
tions invelyed must both be unimodal with frequencies decreasing on both 
Sides of the modal frequencies because the interdecile differences are smallest 
along one portion of the scale and become increasingly larger on both sides 
d this modal portion of the scale. Moreover, the interdecile differences of 
Table 4.9 imply that the distribution involved is symmetrical inasmuch as 
these differences are symmetrical that is, the increases in one direction 
from the modal portion of the seale match those in the opposite direction. 
In the case of Table 4.10, on the other hand, the interdecile differences 
imply a distribution skewed to the right, for not only 15 the — portion 
of the seale not centrally located, but the interdecile que yee = this 
port ion of the scale iwordsse by far greater amounts ina " ti oor Ne us 
Similar consideration of Table 4.11, without ce o Figure 4.11, sug- 
Rests à symmetrical U-shaped (bimodal) a PR RUN 

Inferences regarding the symmetry ani А y 5 stile points b tit 
tribution may ће drawn from a consideration + ree in е ӨШ 
IS not. possible to determine from the points whet P | н а м 
"Mimodal, bimodal, or multimodal. In all QE a is re ч the 
distance between @» and Qi is the same aS that between 9s at 2, ls hereas 
skewed distributions these distances ашта. . vg ned н 
"nd Qs will, of course, be the greater ioi ye " Oe will с с 
skewed distributions, while that between Qi and ¥2 greater in 

93 


PERCE EÊ 
ERCENTILE RANKS AND PERCENTILES 


negatively skewed distributions. Moreover, the more extreme the skewness 
of a distribution, the greater the difference between these two distances. 


Hence, quartile points may be used to indicate both the type and degree of 
the skewness of a distribution. 


4.13 Distances BETWEEN SPECIAL PERCENTILE POINTS AS AN 
INDICATION OF VARIATION AMONG MEASURES 


We have previously called attention to the problem of comparing dis- 
tributions of measures of some attribute for two or more groups of indi- 
viduals for the purpose of determining in which group the measures are the 
more variable in magnitude (see Section 2.10). Later we shall devote an 
entire chapter to further consideration of this problem. It is appropriate 
at this point, however, to call attention to the fact that the principle de- 
veloped in the preceding section suggests one possible means of indicating 
the degree to which the scores in a collection tend to vary in magnitude. 
Since distances between special percentile points are large in those portions 
of the seale where frequencies are small, and small in those portions of the 
scale where frequencies are large, it follows that if the distance between, 
say, Qs and Qi is greater in one distribution than in another, then the 
relative frequencies over this part of the scale in the former distribution 
must be smaller than in the latter distribution. If this is the case, then the 
scores in the distribution in which the distance between Qs and Qi is greater 
must vary more in magnitude, Hence, comparisons for two or more distri- 
butions of the distances between a selected pair of percentile points such as 
Qs and Qi provide an indication of the relative 
comprising thexe distributions. 
example Ds and D, may be 
note that if the distances be 


be thus compared for the purpose of determining 
of two distributions, then the measures comprising 
be in terms of the same score scale, 

By way of example, Figure 4,12 shows the smoothed relative 
polygons of two idealized population distributions of a continuous attribute. 
One of these populations, A, is quite homogeneous, that is, consists of meas- 
ures concentrated over а relatively narrow segment of the score scale, or of 
Measures not markedly different in magnitude. The other population, В, 
Is quite heterogeneous, that is, consists of Measures scattered over a wider 
segment of the score scale, differing more markedly in 
magnitude. Both sets of Measures are reported їп terms of the same score 
scale. Inspection of Figure 4.12 shows clearly that the distance fiin Q. to 
Qi (or from Do to Di) is much greater in the case of the hetera eneous dis- 
tribution, B, than in the case of the more homogeneous не ni 
fact, the distance from Qs to Qi in Distribution B is 7.2 as compared with a 


variability among the scores 
Other pairs of percentile points, as for 
used instead of Qs and Qi. 


It is important to 
tween pairs of special pe 


reentile points are to 
the relative variability 
both distributions must 


frequency 


or of measures 


PERCENTILE RANKS AND PERCENTILES 


Dig О Qia QA Ов Dog 


Score Scale 


Figure 4.12 Smoothed relative frequency polygons (curves) of two 
idealized population distributions (A & В) of measures of а con- 


tinuous attribute 


f 2.8 in Distribution A (see Table 4.12). If Do 


Corresponding difference o 
13.4 for В as compared with 5.6 for A 


and D, are used, the distances are 
(see Table 4.12). 


Special Percentile Points in Distributions of Popu- 


TABLE 
lations A and B of Figure 4.12 
Pont | РРНК Al Dı is | PoruraTIox B| Dir ү 
m | бз | | зт | 
| 21) | | mä 
i | 22.2 18.3 | 
| | 
Qs | 264 28.6 | 
| #8 | 12 
[Qe | 236 | Lo mu | 
——— | ve NS OA онаа ; 
PERCENTILE Ranks DERIVED FOR 


4.14 Comparison OF 

DIFFERENT GROUPS 
a given group, Individual X has a percentile rank of 
е, further, that in a second group 


Suppose that in 
Suppos 


R with reference to some trait. 


95 


PERCENTILE RANKS AND PERCENTILES 


Individual Y also has a percentile rank of /t with reference to this same 
trait. In this section we shall consider under what condition it may be 
averred that the amount of the trait in question possessed by X is the same 
as the amount possessed by Y. It is not uncommon to encounter the 
assumption that, under such cireumstances, the amount of the trait pos- 
sessed by Y is the same as that possessed by Y, without due regard having 
been given the conditions necessary to such an assumption. The mistakes 
that thus arise are usually due to the failure to recognize the implications 
of the fact that the percentile ranks of X and Y were determined with 
reference to two entirely different groups of individuals. 

To consider an extreme situation, suppose that the vocabulary-test 
score made by a first-grade pupil has a percentile rank of 90 with reference 
to the scores made on this test by his first-grade classmates. Suppose, 
further, that the score on this same test made by an eighth-grade pupil also 
has a percentile rank of 90 with reference to the scores made on this test 
by his eighth-grade classmates. In this situation it would be expected that: 
the placement of the eighth-grade distribution would be far above that of 
the first grade on the test score scale. Obviously, if this is the case, the 
vocabulary-test score which has а percentile rank of 90 in the eighth-grade 
class would be much higher than the score which has a percentile rank of 
90 in the first-grade class. It is perhaps absurd to think that anyone would 
be so naive as to assume equality of vocabulary development for the two 
pupils described in this example. Yet it is not uncommon in less extreme 
situations to impute such equality to pupils at like percentile ranks in their 
respective groups in spite of the fact that the groups involved differ in the 
general level of their placement along the score scale. 

But even if the general level of placement of the groups involved were 
the same, this is still not a sufficient condition for the equality of individuals 
having the same percentile ranks in their respective groups. Refer, for ex- 
ample, to Distributions A and B shown in Figure 4.12. The medians of 
these distributions are identical (A Mdn = В Mdn = 


25) und the general 
level of the placement of Group A on the score scale is, therefore, the same 
as that of Group B. 


Yet because the B group is so much more variable 
with reference to the attribute in question, an individu 
rank of 90 in that group possesses 
having a percentile rank of 90 in 
Роу = 31.7). 

A percentile rank of R for 


al having a percentile 
more of this attribute than an individual 
A group (A group P90 = 27.8, B group 


: a given trait in a given group may be inter- 
preted as the equivalent of a percentile rank of 7? for this same trait but 
in a different group only if the relative frequency distributions of this trait 
are the same for both groups —and then only if the scores giving rise {б {һе 
PR-values are completely accurate in the sense that they rani: the individ- 
uals in each group in the correct order with reference to the trait involved 

It should be noted that it is not necessary to require further that the 


96 


PERCENTILE RANKS AND PERCENTILES 


measurements of the trait involved be derived from the same system (test) 
for both groups; for if the measurements are accurate and the above condition 
holds, individuals in different groups who possess equal amounts of the trait 
will rank the same in their respective groups, regardless of the system from 
which the measurements are derived. 


97 


Dp SRCENTILES 
PERCENTILE RANKS AND PERCENT 


AVERAGES: INDEXES 
OF LOCATION 


5.1 INTRODUCTION: AVERAGE AS A GENERAL TERM 


The familiar term "average" is one for which the popular meanings 
are extremely loose and ambiguous. Popularly we use this same term in- 
discriminately in speaking, for example, of the “average American,” the 
“average personality,” the “average yield of corn per acre,” the “average 
household,” the “average high school,” the “average of a distribution of 
test scores,” the “average length of life,” ete. Synonyms for the term in 
its popular usages are such expressions as “typical,” "usual," "representa- 
tive," "normal," and "expected." If asked to define the term more accu- 
rately, the “average man” might г 


spond that it is the single measure, or 
individual, or object, or characteristic that best represents a group or col- 
lection of such measures, or individuals, or objeets, or characteristics. 
However, if he is then asked to select this most representative object or 
measure from the group, he is likely to become less specific. He may say 
that in order to find the average of a group of measures you simply “add 
them all up and divide by the number of them,” but such a concept becomes 
meaningless when applied to characteristics that cannot 
represented, as in the case of the "average American" or the “average 
personality." As we shall subsequently show, even if the characteristic in- 
volved may be measured or numerical ly represented, this proce 
the sum by the number does not in all с 
“representative” result. 


be numerically 


ss of dividing 
vases yield the most "typical" or 


98 


AVERAGES: INDEXES OF LOCATION 


Whatever may be the specifie meanings of the word "average" it is 
clear from the popular meaning of the term that the use of an “average” 
adds greatly to the convenience with which we can reason about groups or 
make comparisons between groups. No person can bear in mind simul- 
taneously the individual characteristics of the objects comprising a large 
collection or group, but he has little difficulty in handling such groups in 
his thinking when he can let a single quantitative index represent the 
whole, that is, when he can use an “average” as a concise and simple picture 
of the large group from which it is derived. 

Suppose, for example, that we are faced with the problem of comparing 
two large collections of numerical data. We could, of course, organize the 
two sets of data into relative frequency distributions and superimpose the 
two corresponding polygons on the same axes. Consideration of the result- 
ing figure would reveal whether the scores of one of the collections tended 
on the whole to be larger—that is, to be placed or located higher on the 
score scale—than the scores of the other; or whether the scores in one col- 
lection were more variable than the scores in the other; or whether there 
Were any notable differences in the form of the two score distributions. But 
even though general comparisons of these types may be made, the fact re- 
Mains that it would be convenient and useful to have some single quantita- 
tive index of the location of a collection of scores considered аза whole, or 
of the degree to which the scores in such a collection differ in magnitude. 
Indexes of the latter type, that is, indexes of variability or dispersion, will 
be treated in the following chapter. In this chapter we shall be concerned 
With indexes of location, or indexes of central tendency. Ts us sfat aia 
such indexes are known as "averages. In statistical literature average 
is a general term applying to all kinds of indexes or measures of the location 
of a collection of scores considered as а whole. | . 

There are at least five averages in common use—the mode, the median, 
the arithmetic mean, the geometrie mean, and the pores meai. of 
these, only the first three are considered in this text. While these various 
ау : . wale indicative of the placement of the 
Averages are all points on the a ыйан! ага ба Бә 
collection of scores as а whole, th ў | 


Or characteristics so that under one 


score st at 1 
ey possess different inc 
e set of cireumstances one average may 
ri set. of circumstances 
ti ; whereas under another set o ances 
? preferable to the others, W аз | ‹ EES 
some other one of the averages May be pa. In the un ибо 
1 K 'erages cited, Investi- 
of this chapter we shall define the first three of the а —— gm 
Bate their properties and consider the circumstances п ] y 
ap з, d 


Should be employed. 
Море DEFINED 


erical data there is a clear-cut tend- 
with greater frequency than any 


5.2 


ollections of num 


In m: ^ " 
any large с : 
alue to occur 


“ney for a certain score V 


99 


AVERAGES: INDEXES OF LOCATION 


other. If such a collection is organized into a unit-interval frequency dis- 
tribution the value of this score is readily determined since it is simply the 
score corresponding to the largest frequency value. Often such scores are 
more or less centrally located with reference to the other score values which 
in turn tend to occur with decreasing frequency in either direction from this 
most frequently occurring value. Such a most frequently occurring score 
clearly provides an indication of the placement along the score scale of the 
distribution as a whole and, hence, may be used as an average. This aver- 
age, which in effect indicates the location along the score scale of a “pile-up” 
or concentration of score values, is called a mode. 

Occasionally the scores comprising a collection will tend to “pile up” 
at two distinctly separate places on the score seale. In such situations the 
distribution is regarded as having two modes, that is, as being bimodal, even 
though the concentration at one place may be considerably greater than 
that at the other. Some distributions may even involve more than two dis- 
tinctly separate concentrations of scores. Such distributions, of course, 
have more than two modes. In general, distributions having more than 
one mode are referred to as multimodal distributions. Figure 5.1 shows the 


10 15 20 25 30 35 40 
Score Scale 


Figure 5.1 Histogram of a hypothetical frequency distribution 


histogram of an imaginary multimodal distribution. This hypothetical 
distribution has modes at 13, 21, and 30 because each of these score values 
is the most frequently occurring score in a distinctly separate concentration 
of score values. 

A more or less formal summary of the foregoing concepts is contained 
in the following definition, 


DEFINITION. 


A mode (Mo) of a frequency distribution is a point on the 
score scale 


corresponding lo a frequency which is large in relation to other 
Frequency values in its neighborhood. 


1 OO AVERAGES: 


INDEXES OF LOCATION 


Most of the population distributions which are of interest {о the psy- 
chologist or educator have but one mode, that is, are unimodal distributions. 
Of course, because of chance sampling fluctuations, the score distributions 
of samples taken from these populations often appear multimodal. That is, 
there will be a number of frequencies in the sample distributions which are 
larger than adjoining or neighboring frequencies simply owing to accidental 
sampling fluctuations. Such chance large frequencies should, of course, not 
be regarded as determining the mode or modes of such distributions. Only 
those large frequencies which are clearly the peaks of major concentrations 
of scores should be considered as establishing modal points. 

It is clear, then, that the determination of the mode or modes of a 
distribution often involves a judgment as to which large frequencies should 
be ignored. In doubtful situations it is best to increase the size of the sample 
for the purpose of noting whether or not the questionable concentrations of 
scores persist. When this cannot be done it is perhaps best to follow an 
earlier suggestion (see Section 2.4) and either set up a grouped frequency 
distribution with relatively coarse intervals or resort to free-hand smooth- 
ing. When a grouped frequency distribution with coarse intervals or classes 
is used as a basis for fixing a mode, the value of this mode is taken to be 
the midpoint of the interval the frequency of which is large in relation to 
the frequencies of neighboring intervals. . : . 

Many people studying statistics for the first time seem determined to 
regard the value of the large frequency itself as the value of the mode. 
They would, for example, regard 16 as the mode of the балот shown 
in Figure 2.9 (or see the table accompanying Figure 2.1). Actually, of 
course, the mode of a distribution is the value of the score point correspond- 
ing to the large frequency and not the value of this frequency itself. Thus, 
in Figure 2.9 the mode has the value 72 (i.e., the midpoint of the interval 


having the large frequency 16) rather than 16. 


5.3 MEDIAN DEFINED 
ıl percentile point and has been defined in Sec- 


The median is a speci ا‎ у 
Es restate the definition here for sake of 


tion 4.4 (see p. 70). We shall simply 
completeness. 


Derinrrion. The median (Mdn) of a distribution is the point on the 
Score scale Bet which one-half, or 50 per cent, of the scores fall. 


5.4 THE ARITHMETIC MEAN DEFINED 


most generally useful and the most im- 


The arithmetic mean is the т Шу , 
, his reason it is common practice to refer 


Portant of the three means.* Fort 
ae TWO се 


tare i 
Arithmetic, geometric, and harmonie. 


101 


AVERAGES: INDEXES OF LOCATION 


to it simply as the mean. It is the only mean which will be treated in 
this book. 


DEFINITION. The mean of a distribution of scores is the point on the 
score scale corresponding to the sum of the scores divided by their number. 


In popular usage the mean is often referred to as the "average." It is 
variously designated by the symbols M m, X (where the individual scores 
are represented by X’s) and u (the lower-case Greek letter mu). In this 
book we shall usually use the device of representing the mean of a given 
real collection of X-scores by X, or of a given real collection of Y-scores 
by Y. Later we shall find it necessary to deal with certain theoretical or 
hypothetical score distributions. Such theoretical score distributions will 
usually apply to some population all members of which are not actually 
available for measurement. We shall reserve the use of the Greek letter ш 
to represent the means of such theoretical distributions. 

It is possible to state the above definition symbolically. Let any col- 
lection of N scores be represented by 


Xi, Xs, Хз, $94. Xx [see (3.1)] 
Then the sum of these N scores may be represented by 
ZX; [see (3.5)] 


Hence, the definition of the mean of any distribution of N scores may be 
written 


x= (5.1) 


To understand the mean as an index of location or central tendency it 
may be helpful to observe that the mean is that score value which would 
be assigned each individual or object if the total for the distribution were 
to be evenly or equally distributed among all the individuals involved. 
It may be thought of as an amount “per individual” or “per object." Per 
capita figures, then, are actually means. Thus, the statement that the per 
capita debt of the federal government is $1,634 simply implies that at a 
given time the total debt divided equally among the individuals of the 
entire population is $1,634. Since this amount corresponds to the total 
debt divided by the number of “debtors” it is by definition a mean. 

There are two important implications of the definition of a mean which 
In the first place, it is the only one of the 
red here which is dependent upon the exact 
the entire distribution. Any change in the 
on will be reflected in the sum of the scores 
median, on the other hand, will reflect a 
nly if that change results in a shift of that 


score past the original position of the median. When such a shift occurs, 


102 


AVERAGES: INDEXES OF LOCATION 


the percentage of scores below the original position of the median will no 
longer be fifty, and consequently the median point will have to be relocated 
to conform to the requirement that exactly 50 per cent of the scores lie 
below it. But if changes in the values of certain scores do not shift them 
from one half of the distribution to the other, the location of the median 
will remain unchanged, regardless of how great these changes may be. 

Consider, for example, the following collection of five scores arranged 
in order of magnitude: 

17, 21, 22, 26, 29 

The median of these scores is 22. Their sum is 115 so that their mean is 23. 
Now suppose it is discovered that an error has been made and that the top 
score should have been 39 instead of 29. The median remains at 22 as be- 
fore, but the mean now becomes 25, thus reflecting the upward change in 
the value of this single score. Similarly, changes may occur in the values of 
certain scores in a distribution without affecting the mode, so that of the 
three averages treated here, only the mean depends upon the exact value 
of each score in the collection. 

The second implication of the definition of the mean which should be 
noted at this point is that it is the only one of the averages which is a func- 
tion of the total or aggregate of the scores comprising the collection. Since 
by definition the mean is the sum—i.e., the total or aggregate—of the 
scores in the collection divided by the number, it follows that the total or 
aggregate of the collection of scores is the product of their mean times their 
number. ‘This relationship may be stated symbolically as follows: 

DX;=NX (5.2) 


Because of these aspects of the definition of the mean, it may be said 
that of the three averages considered here, only the mean is arithmetically 
or algebraically defined. It is largely this characteristic of the mean which 
gives it such a great advantage over the mode and the median in both 


applied and theoretical statistics. 


5.5 COMPUTING THE MEAN 


Table 5.1 gives the scores made on a 25-word anticipation test by 50 
bse hological experiment on serial learning.* 


Subjects participating in a psy¢ 


sed in psychological research on serial 
е, the method consists essentially in pre- 
fixed order a series of words, or syllables, 
learning the subjects are tested by being 
he series while viewing or hearing its 


“Тһе anticipation method is frequently u 
learning. While many variations are possibl 
senting one at a time and always in the same 
Or numbers, to be learned. After a period of 


asked sta “anticipate” the next item in t hil 

ер ш з eae number of items correctly anticipated becomes the sub- 

ject’s аот ir xd E e to be indicative of his learning success or of his ret ention, de- 

pending ЧП ih due lapse between the end of the learning period and the administra- 
he time 


tion of the test. 


103 


AVERAGES: INDEXES OF LOCATION 


TABLE 5.1 Scores of 50 Subjects on a 25-Word Anticipation Test 


18 15 10 12 9 13 11 17 8 9 
10 7 15 5 16 8 12 10 12 10 
9 14 21 11 9 18 4 15 11 13 
8 13 6 10 11 8 12 7 14 10 
11 10 9 11 10 8 10 9 9 16 


Suppose that it is required to find the mean of these 50 scores. Following 
the instructions of the definition of the mean we sce that it is necessary only 
to determine the sum of these 50 scores and to divide this sum by 50, i.c., 
by the number of scores. How the instructions or directions of the symbolic 
statement of the definition given in (5.1) may be applied to determining 
the mean of the scores given in Table 5.1 is shown below. 


> р, ... › 55 
ж” MO Б a RE a 
Р 50 50 


Sometimes the analysis of the data may call for the preparation of a 
unit-interval frequency distribution. For example, it may be necessary to 
determine the percentile rank of each score point. When the situation calls 
for the preparation of such a frequency distribution, it is usually more con- 
venient to defer the computation of the mean until the frequency distribu- 
tion is prepared, for it is a simple matter to compute the mean of data 


E (Score) if fX 

21 1 21 

20 0 0 

TABLE 5.2 19 0 0 
18 2 36 

Cnit-Interval Frequency A : А 
Distribution of 50 Scores 15 3 4s 
Given in Table 5.1 14 2 28 
13 3 БИ] 

12 4 48 

11 6 66 

10 9 90 

9 7 63 

8 5 40 

t 2 14 

6 1 6 

5 T 5 

4 1 + 


104 


AVERAGES! INDEXES OF LOCATION 


organized in this form. To illustrate the procedure, the data of Table 5.1 
have been organized into a frequency distribution involving unit intervals. 
The resulting distribution is shown in Table 5.2. The fX column of Table 
5.2 contains the subtotals for the scores in each class. Thus, the subtotal 
for the class or interval 15 is 45 since, as the f-value shows, 3 scores fall in 
this : and each is taken to have the value of the class midpoint (i.e., 15). 
That is to say, since there are 3 scores of 15 in the collection, the subtotal 
for this class of scores is 15 + 15 + 15 = 45, or using multiplication instead 
of addition, 3X 15 = 45. To find the subtotal associated with each class 
v. then, only to find the product of the class frequency times 
the class score value, i.e., the class midpoint. Once the class subtotals are 
determined they may be added in order to obtain the sum of all the scores 
This grand total divided by the number of scores 
In our example this grand total is 554 and hence 


it is neces 


in the entire collection. 
involved is the mean. 
the mean is 554/50 = 11.08 as before. 

If we represent the frequency distribution symbolically, using the 
notational scheme described in Section 3.4, then the total of the N scores 
involved is as given in (3.13), ie Z/;X;. Hence, if we adapt the definition 
of the mean to this situation we obtain the following computational formula 


Me 21%; (5.3) 


The application of this formula to our example is spelled out below. 


21) + (MEO + (01019) + AUNDH -+ (DG) 
50 


c Ef N 
ДЁ 


X= 


= 11.08 
50 ТС 
It should be noted that (5.3) is « 


efficient. than (5.1) only if the frequency ¢ ! ; айа 
That is, if the time spent in organizing the data into a frequency distribu- 


tion is counted as part of the time spent in the calculation of the mean, then 
the use of (5.1) is more efficient than (5.3). If, on the other hand, such a 
distribution must be prepared anyway for some other purpose, it is usually 
more efficient to employ (5.3) in calculating the mean, as: 

to use (5.3) with a grouped frequency distribution, 
that is, with a frequency distribution the classes of which span more than 
one unit. In this case, however, the mean resulting from the appli sation of 
(5.3) will be only an approximation of the mean obtained by (5. 1) that is, 
of the mean of the original ungrouped scores. As has been previously indi- 
cated (see Sections 2.5 and 3.5), the approximate character of a mean 
obtained by (5.3) in this situation ix due to the failure of the interval mid- 
points to represent with complete acy the magnitudes ut the scores 
If the procedure suggested in Section 2.5 for select- 


'omputationally more convenient and 
listribution is already available. 


It is also possible 


aceur 
falling in the intervals. 


105 


AVERAGES: INDEXES OF LOCATION 


ing the classes is followed, if no equally spaced clustering is involved (see 
Section 2.6), and if the distribution is fairly symmetrical, means computed 
from grouped frequency distributions will usually be sufficiently accurate 
for all practical purposes. 

Inaccuracies arising from the use of grouped data are known as grouping 
errors. Specifically in the case of the mean, grouping error may be defined 
as the difference between the magnitude of the mean computed from the 
grouped frequency distribution and that of the mean computed from the 
original unordered scores or from a unit-interval frequency distribution. 
Clearly, if for each interval of a grouped frequency distribution the product 
of the frequency (f;) times the midpoint CX) is the same as the sum of the 
original values of the scores classified in it, there can be no grouping error. 
In this situation the correct interval subtotals are given by the f; X; products 
and the grand total is necessarily the same as that of the original № scores. 
It is important to note that an f;X; product will give a correct, subtotal if 
the interval midpoint (X;) is itself the mean of the original values of the 
scores classified in this interval. That is, if 


sum of scores in interval 
X; = 


ў 
then 4 


ЈХ = sum of scores in interval 


In most intervals the midpoints will differ from the means of the scores 
classified in them. However, in any given distribution these differences are 
not likely to be all in the same direction. That is, in some intervals the mid- 
points will be larger than the means of the scores classified in them, giving 
rise to /;Х; products which are too large, whereas in other intervals the 
reverse will be true. The net effect of these opposite errors upon the grand 
total as found by summing the fiX; products is, therefore, usually negligible. 

To illustrate this effect the data of Table 5.2 have been organized into 
a grouped frequency distribution with intervals of size 3.* This grouped 
frequency distribution together with the approximate and correct interval 
subtotals and the differences between them are shown in Table 5.3. The 
correct or actual interval subtotals were, of course, obtained from the 
original score values as given in the unit-interval frequency distribution of 
Table 5.2. For example, consider specifically the interval 6-8. This 
interval has the midpoint 7 and contains 8 scores so that its approximate 
contribution to the grand total is 56 (i.e., 8 X 7). Actually, however, this 
interval contains 5 scores of value 8, 2 of 7, and 1 of 6, so that the correct 


subtotal involved is 60—i. (5х8) + (2х7) + (1X 6). The mean value 
PED E 


Om. x А 
An interval of size 3 is actually too coarse for use with these data if the resulting 
distribution is to be used for computational purposes (see Section 2.5), so that it should 


be clearly recognized that our 1 i is si i 
i К г ise of an interval of this size with these data is for con- 
venience of illustration only. ташар 


106 


AVERAGES: INDEXES OF LOCATION 


TABLE 5.3 Grouped Frequency Distribution for Table 5.1 
Showing Differences Between Actual and Estimated 
Interval Subtotals 


3 X - ESTIMATED AcTUAL " 1 
LASS ч f SUBTOTAL (fX) | SUBTOTAL Heron 
21-23 22 21 +1 
19 2 36 +2 
16 6 94 +2 
13 9 115 +2 
10 22 219 44 
7 8 60 —4 
4 2 _9 -1 
50 554 T3 


of these 8 scores is actually 7.5 (i.e., 60 + 8). In this case since the interva 
midpoint (7) is smaller than the interval mean the approximate subtotal 
is too small. Other intervals, however, give rise to approximate subtotals 
Which are too large so that the net error is negligible. In our example this 
error in the grand total is 4-3 so that the error in the mean is only +.06 
(i.e., 3+ 50).* 

In a unimodal distribution the midpoints of intervals below the mode 
Will in general tend to be smaller than the means of the scores classified in 
these intervals, This follows from the fact that in such a distribution more 
of the seores will tend to fall in the upper half of these intervals—that is, 
in the half nearer the mode of the whole distribution—than in the lower 
half. Hence, intervals below the mode tend to give rise to approximate 
all. However, the reverse is true of the approxi- 
Mate interval subtotals of the upper half of the distribution. Consequently 
the net error remaining in the approximate grand total is usually negligible 
in the case of roughly symmetrical distributions. The error which does 
remain results largely from the fact that the mode of the original scores 
may not be centered in the modal interval. If no equally spaced clustering 
is involved, so that the arbitrary rule we have adopted for interval place- 
e mode of the original scores is equally likely to 
modal interval. Consequently that grouping error 
but about as likely to be in one direction 


subtotals which are too sm 


ment is appropriate, th 
fall in either half of the 
Which remains is not systematic, 
as in the other. 

. From the foregoing diset 

in ChE mann computed from grouped dat 


— 


should also be apparent that the error 


ission it ae s 
a tends to be positive (i.e., the 


X,-XLN; _ (22) + (9 
у = 5 


ype + 00 597, 1114 
0 50 
Error = 11.14 — 11.08 = + -06 


107 


AVERAGES: INDEXES OF LOCATION 


mean tends to be too large) when the distribution is skewed to the right and 
negative when the distribution is skewed to the left. 

Finally, it should be observed that the grouping error tends to increase 
with increases in the coarseness of the intervals. It is for this reason that 
a minimum of 15 classes was recommended in Section 2.5. To illustrate 
this last point, Table 5.4 shows the data of Table 5.1 organized into a dis- 
tribution involving only three extremely large intervals, each spanning 
9 units. The grouping error in the mean computed from this distribution 
is + .66, an amount 11 times as great as occurred in the case of the dis- 
tribution of Table 5.3. 


TABLE 5,4 Data of Table 5.1 Classified Into Extremely Coarse 
Classes 
CLASSES Ж p yx Trus ERROR 


SUBTOTAL 


18-26 22 3 66 57 +9 
9-17 13 37 481 428 4-53 
0-8 4 10 40 69 — 29 

50 587 554 T33 
Хо= = 11.74 Error = 11.74 — 11.08 = + .66 


5.6 GROUPING Error iN THE MODE 


In Section 5.2 it was suggested that when the data are organized into a 
grouped frequency distribution the midpoint of the modal interval be used 
as an estimate of the mode of the original distribution. Since the location 
of the mode of the original scores within the modal interval depends upon 
the placement of the intervals along the score scale, and since this place- 
ment is determined by rule-of-thumb procedure, the midpoint of the modal 
interval will tend to fall above the mode of the original scor 
often as it will tend to fall below it. That is to say, there is no systematic or 
directional tendency in the error in the mode computed from grouped data. 


about as 


5.7 Some SiMPLE RULES REGARDING THE MEAN 


In this section we shall present some simple and useful rules regarding 
the mean. We shall illustrate each with a simple numerical example and 
also present its proof or derivation. While some beginning students may 
wish to omit consideration of these proofs, it is important that all students 
acquire a complete understanding of each rule, Perhaps the best way to 
acquire such understanding is to check or verify the rule in the ease of a 


108 


AVERAGES: INDEXES OF LOCATION 


specific example. For this reason the order of presentation will consist of 
(1) the statement of the rule, (2) a verification in the case of a specific 
numerical example, and (3) the proof. 

Rete 5.1. The mean, M, of a collection of scores formed by pooling k 


subgroups of scores, is the sum of the products of each subgroup mean multiplied 
by its number divided by the total number of scores in all subgroups. Or 


symbolically, 


УњХ 
М = (5.4) 
s = 
Хп; 
where nj the number of scores in subgroup j, and 


Tj = the mean of subgroup Ј. 


Consider the following 5 subgroups of scores (here k= 5): 


Example. 
Subgroup Scores Sums n X 
1 21 3 7 
2 20 5 4 
3 22 2 п 
4 15 3 5 
5 M 4 6 
102 17 


By (5.4), the mean, M, of the entire colleetion may be found as follows: 


XX; BxD+6 3x5 ) 102 
м a Dud; ے‎ OX Dc (6x 4)+ QxiDTGX 5) + (4 X 6 =! 8 
anj 3+5+2+3+4 7 
To verify this result it is necessary only to find the grand total of all 17 
scores and divide by 17. This grand total may most easily be determined 
by adding the subtotals as found separately for each group. Thus, 

214+ 204+ 22+ 15+ 24 ے‎ P= 6 


= 17 í 


Proof. We have given a large collection of N scores organized into 
: Y e - . . ч $ 
› A scheme for representing this situation symbolically has 


k suber à 
a 3.0 and 3.10. We shall use here the notation 


been presented in Sections 
established in these sections 
Now the sum of all the scores 


(sce Table 3.3). 

in the entire collection is given by 

› nj nk 

ы Дене} д, Marts У Хы [see (3.29)] 


k n, 
X ЎЎ д 


Hence, by definition (5.1) the mean of the entire collection is 


== (1) 


109 


AVERAGES: INDEXES OF LOCATION 


Now letting X; represent the mean of Subgroup 1, Yo the mean of Sub- 
group 2, etc., we have by application of (5.2) 
козу 


> X а ы Koei Ду Lux 
j=1i=1 
k 
= ons (2) 
j=l 
Also, from (3.33) we have А 
N= > nj (3) 


j=1 


Now, substituting from (2) and (3) into (1), we obtain 


which establishes the rule. 


Rute 5.la. If each subgroup contains the same number of individuals, 
n, the mean of the entire collection, M, is the mean of the 


subgroup means. 
Or symbolically, 


M- БУ (5.5) 
Example. Consider the following 4 (here k = 4) subgroups of scores. 
Subgroup Scores Sums X 
1 8,4, 9 21 7 
2 0, 7, 2 9 3 
3 5, 5,8 18 6 
4 4, 2, 6 12 4 
60 20 


Applying (5.5), we obtain 


M=2Xi_74+8+6+4_20_, 
4 4 4 
To verify this result we shall simply apply (5.1). Thus 


m=STA+9+04+74+2454548444246 60 
12 157-9 


Proof. Since he 


те each subgroup contains the sam 
n, we may drop the 


; : р е number of scores, 
group identification subscript fro: 


m n in (5.4) thus: 


110 


AVERAGES: INDEXES ОЕ LOCATION 


FT 


gal 
Now applying (3.19) and (3.21) we obtain 


k = 
n> X; 


which proves the rule. 


Given m part scores and the sum, 5, of these m scores for each 


Rune 
¢ mean of these sums, S, is equal to the sum of the 


Of n individuals. Then th 
means of the separate parts. Or symbolically. 
§= =X; (5.6) 


n scores of part j. 
test consisting of three distinct 
Part 2, Fundamental Operations; 


where Ху represents the mean of the 

Example. Consider au arithmetic 
Darts: Part 1, Fundamental Concepts; 
and Part 3, Problems. From each part а separate score is derived (here 
m = 3). Suppose this test to have been administered to 5 children (n — 5) 


Wi in атор mE 
ith vesults ax shown in Table 5.5. 


Part Scores (X) т 
т, 5 - OTAL 
ABLE 5.5 PuriL ETE Е з | Scone (8) 
i 
богов p? ” 1 4 6 5 15 
bores of Fire Children on Я z | f 2 En 
^ Three-Part Arithmetic 3 4 9 " 20 
sid 4 9| 6 | 2 17 
5 6 8 4 18 
MER 
TOTALS 30 40 20 90 
iso 
| _ T 6 8 4 
Now applying (5.6) we have 


111 


AVIS "MON 
AVERAGES: INDEXES OF LOCATION 


To verify this result we need only apply (5.1) to determine the mean of 
the S-values. 
= Ху, 15+204+20+ 17+ 18 90 


5 = = 18 
n 2 2 


Proof. We have given m part scores and the sum, S, of these m scores 
for each of n individuals. The general scheme for representing this situation 
symbolically has been presented in Sections 3.11 and 3.12. We shall use 
here the notation established in these sections see Table 3.4. Now 
by (5.1), 


M 
M 


Ж 
Lizií-l1 (since the m fractions have 


i the common denominator л) 


Now applying (3.39) 


< 1j=1 
z ifa 
EX; 
A n 
m 
But 2, Xj: = S;, and hence 
1 , 


j-1 


which establishes the rule. 


Кот 5.3. Let a constant amount, C, be added to each of N scores. Then 
the mean of the new set of scores thus formed is equal to the mean of the original 
set plus this constant amount. Or symbolically, 


Mye =y +C (5.7) 


Example. Consider the 5 scores 10, 14, 7, 9, 10, of whieh the mean is 
10 (here № = 5). Now let C= 3. Then by (5.7) the mean of the new set 
of scores formed by adding 3 to each of these 


SCONES is 
Mx,32104-3— 13 
To verify this result we shall actually form the 


mine its mean by application of (5.1). 
for which the mean is 


hew set of scores and deter- 
"The new set is 13, 17, 10, 12, and 13, 


Mia = ETE tet 13 88 o 


112 


AVERAGES: INDEXES OF LOCATION 


Or if € = — 2, the mean of the new set as given by (5.7) is 
Mxij-2 = 10+ (2) = 8 


Verifying as before, the new set is 8, 12, 5, 7, and 8, for which the 
mean is 


М-ду = 


Proof. By (5.1), 


› 2 


8+12+5+7+8_40_. 


N 
> Quo 
i=l 
N 


Mxic= 


Now applying (3.20) we have 


Мхъс= 
And by (3.21) 


Mxic= 
From this equality we have 
Мусе X FC, 


which establishes the rule. 


It should be noted that C may be either a positive or a negative number 
and hence the rule holds in the case of subtracting a constant from each 
score as well as in the case of adding a constant to each score. 


Let cach of N scores be m ultiplied by a constant amount С. 


Rene 54. 
scores thus formed is equal to the mean of the 


Then the mean of the new set of 4 
original set multiplied by this constant amount. Or symbolically, 


Meee CX (5.8) 


Example. Consider the 5 scores 21, 9, 12, 6, 12, of which the mean is 
12 (hor, "i : 5) Now let C= 2. Then by (5.8) the mean of the new set of 
Nx mS | dnd я 


3 М * we scores by 2 is 
Scores formed by multiplying each of these scores by 2 is 


Mex = (2)(12) = 24 
e shall actually form the new set of scores and 


To verify this result w = e ads Е ; 
etermine its mean by application of (5.1). The new set is 42, 18, 24, 12, 
And 24, for which the mean is 

42 + 18 + 24 +12+24 . 120 _ 24 
SS ee == 
Мәх = 5 5 


113 


AVERAGRS: INDEXES OF LOCATION 


Of if C = 1/3, the mean of the new set as given by (5.8) is 
Myx = (13002) =4 
Verifying as before, the new set is 7, 2, 4, 2, and 4, for which the mean is 


7T3444244 20 
Мх = = жузу 


ә 2 


4 
Proof. By (5.1) 


Mex к= = 


Applying (3.19) we may write 


Mex = 


which establishes the rule. 
It should be noted that C may be either an integer or a fraction. Hence, 
by letting C be a fraction of the type a the relationship holds in the case of 
y 2 


dividing each score by a constant as well as in the case of multiplying each 
score by a constant. It should also be observed that (5.7) and (5.8) may be 
applied in combination. Thus, if each score of a set is multiplied by a 
constant C and then increased by a constant amount D, the mean of the 
new set thus formed is given by 


aces а 086505 (5.9) 


Verification of (5.9) in the case of a specific example and a formal state- 
ment of its proof is left as an exercise. 


5.8 A PROPERTY OF THE MEAN 


Suppose that some point is selected on the seale of values of a given 
collection of scores. We shall call this point A. Now suppose that for each 
score larger than A the difference or distance between the score value and А 
is determined, and that these differences are summed. Suppose further that 
the corresponding sum is determined for all scores smaller than A. Now 
it is a characteristic of the mean that these two sums would be equal if, and 
only if, the point A is located at the mean. In other words the mean pos- 
sesses the property that the aggregate of the distances from it of the scores 
lving above it is the same as that of the scores lying below it. 

In statistical terminology the distance of a score from a point on the 
score seale is referred to as the deriation of the score from that point. It is 


114 


AVERAGES: INDEXES OF LOCATION 


customary to compute these deviations by subtracting the value of the 
point from that of the score. Hence, if algebraic signs are retained, the 
deviations of scores having values greater than that of the point are posi- 
tive, whereas those of scores having values less than that of the point are 
negative. If the point involved is taken at the mean, the net (algebraic) 
sum of all the deviations will be exactly zero, for the sum of the positive 
deviations will be exactly canceled or offset by the sum of the negative 
deviations. A formal statement of this property of the mean together with 
an illustrative example and a general proof follow. 
Retr 5.5. The algebraic sum of the deviations of N scores from their 
mean, X, is zero. Or symbolically, 
Z(X;—-X)20 (5.10) 


Just as it is common practice to represent any score by the upper-case 
X, it is common practice to represent the deviation of this score from the 
mean of the collection to which it belongs by the lower-case z;. That is, 


ai Xi—X (5.11) 


Hence, (5.10) may also be written 
Sx; =0 (5.12) 
Example. Consider the scores 10, 7, 12, 15, and 11, of which the mean 
is 11. The deviations of these scores from their mean are respectively — 1, 
— 4, +1, + 4, and 0. To verify the application of (5.10) in the case of this 
example, we need only find the algebraic sum of these deviations. That is 

, А 
У;=-1-4+ 1+4+0=0 

Proof. Given a set of № scores X; G =н 2.235, N) with sere X. 
Now the deviations of these scores from X—ie., the values v; (č = Ty Bee =», 
N)—may be obtained by adding to each score the constant — X. Hence, 


by application of (5.7) where C =— X, we have 


Mx—X 


Now applying (5.2) we may write 
Sx, = № = №0) = 0 


which establishes the rule. 
Alternative proof. 


[by (3.20)] 
[by (3.21)] 
[by (5.2)] 


115 


AVERAGES: INDEXES OF LOCATION 


median by an amount equal to 5c — 4c, or с. Therefore, in this situation 
the sum of the absolute values of the deviations of these 9 scores ix smaller 
when these deviations are measured from the median than when measured 
from A.* 

Case II: 1 collection consisting of 8 (an eren number) scores. Consider 
the score scale shown in Figure 5.4, along which 8 scores have been plotted. 


Figure 5.4 Scale showing values of a collection of 8 scores 


Since this collection consists of an even number of scores, the median is 
indeterminate in the sense that any score point between X 4 and Х; 


; satisfies 
the definition of the median. The argument which follows holds for any 
value between X4 and X; but so that it may be stated as definitely as pos- 
sible, we shall follow the convention previously suggested (see Section 4.8) 
and locate the median at a point midway between X4 and Vs. As before, .1 
ix some point on this scale not a median so that A cannot be located in the 
interval between X4 and Xs, all points of which are median points. The 
distance between A and the arbitrarily selected median point is c. 

Again it is clear that the aggregate of the absolute deviations of the 
scores Xs, Xo, X7, and Xs is 4с greater when these deviations are measured 
from A than when measured from the median, whereas the aggregate of the 
absolute deviations of the scores X, J 2, and X; іх Зе less when the devia- 
tions are measured from 4 instead of the median. Hence, for these 7 scores 
the aggregate of the absolute value of their deviations from A is 4c Зе, 
or c more than from the median. Now the remaining score, Ха, may be 
closer to A than to the median, but since it lies between 
the amount by which it is closer to A must necessarily be less than с. Hence, 
again in this situation the sum of the absolute values of the deviations of 
these 8 scores is smaller when the deviations are 
than when measured from A. 

Finally, it should be noted that the appli 
gregate proximity, that is, the use of a score value to which all the scores in 
a collection are closest, as a definition of typicalness is a practice which 
would meet with general acceptance. Hence, when the purpose of an aver- 
age is to portray or represent the “typical” score in a collection, the median 
of the collection should usually be the average employed. Further justifica- 
tion for this recommendation is given in subsequent sections of this chapter. 


1 and the median, 


measured from the median 


cation of the criterion of ag- 


*The student may 


find it instructive to repeat this argument using the point A' 
in Figure 5.3. 


shown 


118 


AVERAGES: INDEXES OF LOCATION 


‘TION OF AN AVERAGE: REPRE ING THE TYPICAL SCORE 
OF A UNIMODAL DISTRIBUTION CONTAINING EXTREME Scores 


We shall first consider the effect of extreme scores upon the mode, 
median, and mean in the case of some simple numerical examples. The 
extreme scores included in these illustrative collections are quite unrealistic 
in the sense that they differ so markedly from the other scores involved 
that they clearly do not appropriately belong in the same collection. This 
Was permitted, nonetheless, in order to provide examples that would be 
particularly striking in demonstrating the effects under consideration. 
While in more realistic collections these effects would not be as extreme, 
they would still be of the same general character. 

Figure 5.5 shows the histograms of four collections of scores. The num- 
bers entered in the rectangles are the frequencies associated with each score 


T T 
80 81 82 83 84 85 86 145 


| 
8 
0 81 82 83 84 85 86 Distribution B 


Distribution A 
Mo 2 Мап 583; M=86 
Mo — Mdn =M =83 РВОМ) =95 
PR(M) = 50 


s bist 


1 
ru 82 8384 85 86 87 143 
21 80 81 82 83 84 85 86 Distribution D 
Distribution C TM Mo —83; Mdn =84; M=87 
М-М а В PR(M) =95 
PR(M) = 


часне 3.5 Distributions showing effect не stores on:avennges 

D егез £f d У ac 3 m 
Value. Each collection pictured involves gs "cen potus бушый ыа 
Bram are the values of the three averages for A is ЖУПН ану 
Пе Percentile rank of the mean. Distribution б ae ‘three averages locate 
( Stribution. In such a distribution, of fone rily Gadda ‘ae hes 
“1° exact center of the distribution and must КУ ibution A except the " 
the same value. Distribution B is the same " А мене ЫШ eiae 
he score has been changed from 85 to 145. | ‘Danae was and remains 
have no effect on the mode, and since the score cha nins 


119 


AVERAGES: INDEXES OF LOCATION 


above the median, the value of that average will also be unaffected. On 
the other hand, the effect of this one extreme score on the mean is very 
marked—so marked, in fact, that the value of the mean is now larger than 
that of 95 per cent of the scores in the collection, and hence can scarcely be 
regarded as a "typical" or "representative" value. Distribution C, which 
ix the mirror image of Distribution B, shows that an extremely small score 
can pull the value of the mean downward just as markedly as an extremely 
large score can raise it. 

Distribution D is J-shaped, with one score being extremely larger than 
the rest. The modal value remains at 83, but since scores which in Dis- 
tribution A were below the median have now been shifted above it, the 
median of Distribution D will necessarily be higher than that of Distribu- 
tion A. The change in the median, however, is not nearly as marked as the 
change in the mean—a change due almost entirely to the presence in the 
distribution of a single extreme score. 

The question naturally arises in the case of a distribution like D as to 
which average should be employed if the purpose is to select or provide a 
value typical of the values of the scores comprising the collection. Clearly 
the mean value is atypical. Some might argue that the modal, or most. 
frequently occurring, value is more appropriate to this purpose than the 
median. It will be noted, however, that while the mean value is larger than 
95 per cent of the scores in this distribution, the modal value is smaller 
than 82.5 per cent of the scores. Moreover, in terms of the criterion of most 
frequent occurrence, there is very little basis for choice between the modal 
value of 83 and the median value of 84. This is usually the case in most 
unimodal distributions even in instances of rather marked skewness, Con- 
sequently, the most appropriate of the averages in situations of this type 
is the median which satisfies not only the criterion of equal numbers of 
smaller and larger values, but also the criterion of aggregate proximity 
(5.13). (In Distribution D the aggregate of the score distances from the 
mean value of 87 is 112, from the modal value of 83 is 80, and from the 
median value of 84 is 74.) 

For examples of the relative magnitudes of the three averages in more 
realistic distributions involving extreme scores, the student should refer to 
Tables 2.9 and 2.10. In the case of the distribution of annual incomes 
shown in Table 2.9, the modal value is $125,* the median value is $1,250. 
and the mean value is approximately $1,795. Here approximately 94 per 
cent of the values in the distribution are above the modal value and ap- 
proximately 65 per cent are below the mean value. The aggregate of the 


*$125 is the midpoint of the modal class. Since the classes vary in size, it is neces- 
sary, in order to determine the portion of the scale in which the great 
of values occurs, to express the class frequenci 
this is done the clas 
is this lowest class. 


st concentration 
re as proportions of the class size. When 
s which has the greatest concentration of scores per unit of el: ze 


120 


AVERAGES! INDEXES OF LOCATION 


score differences from the mode and mean are 1,669,875 and 1,282,035 re- 
spectively, while from the median this aggregate is only 1,207,875. It is 
again clear that the median value is most appropriate for the purpose of 
representing the typical individual. 

The situation is similar in the case of the distribution of years of service 
shown in Table 2.10. In this distribution the modal value is 0.5, the median 
value is 6.2 and the mean value is 9.7. Approximately 90 per cent of the 
measures in this collection exceed the modal value, while some 61 per cent 
are smaller than the mean value. The aggregate of the score differences 
from the mode and mean are 3,324.0 and 2,908.4 respectively, while from 
the median this aggregate is 2,765.5. As in the previous example, the 
median ix the most appropriate average for the purpose of representing the 


typical individual. 


5.11 SELECTION OF AN AVERAGE: INTEREST CENTERED ON TOTAL 
RATHER THAN TYPICAL 

In Section 5.4 we pointed out that of the three averages considered, 
only the mean is a function of the value of each score. It is, of course, this 
Property which results in the sensitivity of the mean to extreme scores. 
The mode is completely unaffected by any change in a score value that does 
not alter the location of the major concentration of scores. The median is 
insensitive to any changes in score values which do not Y^ pe prd 
of the proportions of scores above and below it. Hence, the an ian a a 
given collection of scores will be affected only by ш "аше in аре 
Values as may result in a shifting of these scores past t е Е үш va i e 
the median. “Thus to raise the жешип per ee te xm ч 
Class on some test will find it more purus Laer ee 
tional efforts : as iduals whose initial performs B ARG ANG 
reci sara tme i pair value of the median, for it is this group 


K st likely to succeed in raising 
Of Punts сеансе she will he mos А k 
Pupils those denim But this, of course, represents an instruc- 


as a teacher she should be concerned 


of all her pupils. 
he inappropriate selection of an 


a teacher seeking 


Past this original median value. 
tional procedure of dubious value. for 
With improving the performance levels 
The f i : illustrates. t 5 
" he foregoing example at hand. This basie purpose is not, or cer- 
Werage for the basic purpose at Wf, -alue of a specific avers 
tainly ا‎ be, iê of simply raising ' тй: чн a ры 
ЭШ rather one of raising the performance level of a са : 


is ace is is to be reported in 
Ї : : <. purpose Is ас complished i 
iss ене а рей scores, that average should be em- 
of the fine st # 


al performance level of the class as a whole. 
: top mean, which is the only one of the 
М on the aggregate or total of the 


the form of an average 

Ployed which is based on the tot 
lis, of course, implies the use ‹ e d 

three averages considered that is base 


121 


SVERAGES: TyDEXES OF LOCATION 


score values. This average, unlike the median or mode, will be sensitive 
to any change in the performance level of any individual pupil. 

As a second example of a situation in which the total is of greater con- 
cern than the typical, consider two communities of comparable size, one. 
Community A, in which the ownership of real and personal property is 
largely concentrated in the hands of a relatively small number of individuals 
and another, Community B, in which this ownership is much more widely 
dispersed. Suppose, then, that in Community A the median assessed value 
of real and personal property owned by each individual is $250, while the 
corresponding value for Community B is $2,500. Suppose further, however, 
that in Community A there are a few extremely valuable properties so that 
the mean assessed value of property owned by each individual is $3,500, 
while for Community B this mean value is $3,000. Now, if the school pro- 
grams in these communities are supported by a direct millage levy on the 
property owners, which community is in the stronger financial position? 
That is, in which community will a given millage levy produce the greater 
income? The answer is clearly that community which has the greater total 
assessed valuation, for the total tax income (assuming no tax delinquency ) 
is simply the product of the millage levy times the total assessed valuation. 
Now, since the mean is the average related to total, that community which 
has the greater mean assessed valuation will also have the greater total 
assessed valuation (the two communities being of the same size), Hence. 
other factors (such as indebtedness) being equal, Community A is in the 
stronger financial position as regards the support of its school program. 

These examples should suffice to establish the conclusion that when the 
purpose to which an average is to be put has to do with the total or aggre- 
gate of the collection of scores involved, then the 
employ is the mean. It alone of the three 
lated to total. 


appropriate average to 
averages here considered is re- 


5.12 SELECT I is — ` 
5.12 SELECTION OF AN AVERAGE: ( ASE OF MULTIMODAL DISTRIBUTIONS 


Suppose that we are concerned with a multimodal distribution (for 
example see Figure 5.1) and that we wish to use an average for the purpose 
of representing the lypical score value. If the situation further demands 
the use of a single-valued average, as would be the case were our purpose 
to On ore the typical score value for this distribution with that of some 
other distribution for which only a single-valued average could Bê obtained 
the appropriate choice would be the median—i.e. thie comparison sould 
be made between the medians of the two distributions ics 

| If, on the other hand, the situation does not demaid th е of a 
single-valued average, a more complete pieture of the typical pcc ; 
multimodal distribution would be the multi-valued ni ag 
the modal values of the distribution. This amounts is pori 


score of a 
onsisting of 
ng the loca- 


122 


AVERAGES: INDEXES OF LOCATION 


tion of cach major concentration of scores. ‘Thus, given the modal values 
13, 21, and 30 (see Figure 5.1), we know that while the score value 13 is 
typical of a substantial portion of the distribution, the score value 21 is 
typical of another substantial portion of the distribution, and the score 
value 30 of still another such portion. 


5.13. SELECTION OF AN AVERAGE: EXPECTED VALUE 


In order to introduce a useful mathematical concept—that of expected 
value—we shall propose a simple hypothetical game played by two persons. 
The game is to be played with a deck of 100 cards. The cards are identical 
оп the back, but differ on the face in that 45 of the cards are blank, 10 con- 
tain a large black dot, 40 а large blue dot, 4 a large yellow dot, and 1 a large 
red dot. One of the players, known as the Banker, shuflles the deck and 
then spreads the cards face down on a table. The other player, known as 
the Drawer, selects a single card of his choice. If the card thus selected is 
one of the blank cards, he receives no payment from the Banker. If the 
‘ard selected contains a black dot, he receives $1.00 from the Banker. If 
the card selected contains a blue dot, he receives $2.00 from the Banker. 
If the card contains a yellow dot, he receives $5.00, and if the card drawn 
is the one with the red dot, he receives $890.00 from the Banker. Each time 
the game is played the Drawer must pay the Banker an amount which will 
make the game a fair one for both players. The question is, what should 


this amount be? 


Since this is purely a game of chance in the sense that no clement of 


skill is involved, we shall define the game as fair if both players could expect 
neither win nor lose) in the long run.* Now it may be 
expected that in the long run the Drawer would select a given kind of card 
With the same relative frequency as that of this kind of card in the playing 
deck. Thus, for example, the Drawer could expect, over a long period of 
play, to select a blue-dot ага forty-hundredths of the time. If ^ is used 
to represent a very large number of plays, the оша of times w ih the 
Drawer could expect to select a blue-dot card is for ta mandren ihs ai N 
(40). Since the value to him of each such selection 5 po ipei the 
total value of the expected number of such selections i N games would be 
the product of two dollars times 40N, or 80N dollars. — mr 
"Table 5.6 summarizes the nature of the deck and m E A ang n 
expected frequency and the long-run expected receipts of the Drawer for 


sys m‏ کک کک 
жу кй infinity of repetitions of the game. Repetition‏ 
Theria V " implies an шу | Ба hät is 1 ;‏ 
long run , К fio game situation must be duplicated. TuS is, if the‏ ا i‏ 
Hi plies that on each боео ised on each repetition the card drawn шн. зе returned‏ 
t me rather than a new deck is и next game. Unless this is done the deck will not be‏ 
the deck before the shuffle for the the game situation on one occasion will‏ © 


V "hat is. 
he same from one game to the next. T 5 an not be taking place. 
differ from that on another and repetition V 


to break even (i.e., 


123 


AVERAGES: INDEXES OF LOCATION 


each type of card. It also shows that the total of these expected receipts 
in М plays would be 10.N dollars. Now if in this large number of plays the 
Drawer may expect to receive a total of 10.V dollars, then according to our 
definition of a fair game, the Banker must also expect to receive 10.V 
dollars. If this total is divided by the number of plays, N, we obtain $10 
as the amount which the Drawer must pay the Banker each game. 


TABLE 5.6 Nature of Deck and Long-Run Expected Outcomes 
of a Hypothetical Two-Person Game 


7 Э Revative| Freg. [Ёхрксткр Amount To 
Кіхр оғ Value rolFPrequency| |, : uat 
Fr IN TED Be I ED BY 
Gane Derwen) бо Deck [in N PrAvs| Drawer IN N PLAYS 
Blank $0 45 А45 ABN 50 
Black Dot $1 10 10 AON $0.10.V 
Blue Dot $2 40 40 AON $0.80.V 
Yellow Dot | $5 4 04 04N $0.20.V 
Red Dot $890 1 01 OLN $8.90.V 
100 1.00 N $10.00." 


This payment thus makes the game a fair one for either player. The 
Drawer pays the Banker a constant amount, $10, for the privilege of draw- 
ing a card. The Banker pays the Drawer a variable amount, depending 
on the card drawn. In the long run the variable amount that the Drawer 
receives will balance the constant amount that the Banker receives. Theo- 
retically, then, the game is worth the same amount to each player even for 
a single play. This amount, $10, is known as the expected value of the game 
for both Drawer and Banker. 

It is clear that the term "expected value" applies only to what each 
player can expect to receive per game in the long run. The net value of the 
game to each player is, of course, zero, since each player's expected value 
is offset, in the long run, by the amount he must pay out. In faet, the 
Banker's expected value is the Drawer's fee for playing, and in the long run, 
the Drawer's expected value is the Banker's fee for playing. 

It is important to note that the $10 expected value of one play of the 
game is a long-run value. The short-run results may deviate quite far from 
this expected value, and it is for this reason only that the game is of any 
interest at all to the players. It is, in fact, impossible for the Drawer to 
balance his $10 payment for a single draw by an equal payment from the 
Banker, since there is no ten-dollar card. Only in the long run will payments 
and receipts offset one another. On a single play the Drawer has only one 
opportunity to come out ahead, and that is by drawing the red dot or $890 
card. All other cards return him less than his $10 fee. 


The Drawer can 
thus expect a net gain on only 


1 per cent of the plays, in the long run, but 


124 AVERAGES: 


INDEXES OF LOCATION 


the winnings on these occasions are large enough to balance the expected 
losses he will incur the other 99 per cent of the time. 

The percentages referred to in the preceding paragraph are long-run 
expectations only. The exact order in which the outcomes occur cannot 
be anticipated. The Drawer may draw the red-dot card two or three times 
or more the first 100 plays. If his "luck" results in such early winnings he 
game financially ahead, or if his "luck" does not 


may withdraw from the 
ay be forced to withdraw a loser. It is 


result in such early winnings, he m 
the possibility of a favorable sequence of outcomes that attracts players to 
participation in fair games and even in unfair games. 

alue of $10 for one play of our hypothetical game 


Now the expected v 
deck. That is, 


is actually the mean value of the cards in the 
(S2) (04.V)($5)+ (01.1) ($890) 
V 

(.40)($2) + (00 ($5) + (.01) ($890) 


(45 Л)(80)+ (10. №(80+ (40. 


E(V) 


= (.45)(80) + (.10)(81) + 


(45) (80) + (100(8D + ( 40)($2) + (4)(85) + (1)($890) 
100 


= МОГ) = $10 

alue of one play and M(V) represents 
of the cards in the deck. This result is consistent with the 
mean expressed in (5.10) which states that the 
algebraic sum of the deviations of a collection of scores from their mean is 
zero, or what amounts to the same thing, that the sum of the positive 
deviations from the mean (gains or winnings) equals the sum of the negative 


Where Æ(V) represents the expected M 
the mean value 
important. property of the 


deviations ees) | 
re at ae i sxpected value (also called mathematical 
We shall now define the expectet. 7+ ауасынан 
expectation) in more general terms. Consider the notationa sehe me pre- 
sented in Section 3.7 for representing any relative frequency distribution. 
hare X, эв presents the value of the scores in a elass. This is 
ae t of identical cards in our game. Now 
Suppose we select a single score (single card) from the entire collection by 
se we select a single + ЖИМ? ees 
Some purely chance or random procedure. Let this s ein proc e 
$ ; d i NE » selected score 
repeated a very large number of times, N, with the x xs scor vins 
i ee. i so th: » process always involves 
returned to the distribution cach time so that the рон ss always ig Vés 
ў e" NON мее v le 2x 
drawing f » distribution. Then the expected proportion of times 
"ei. н the чип ‹ н: «зч proportion of X, values in the 
X; would be selected is the S а aC Ce a in the 
distributi е p. and the expected number of Y; uar : ы, » refore 
à: on, Le., pj t aes expected — Table 5.6 —for 
be Мр, (C : e this with the frequencies exper ted —Tabk or the 
vari 24 е у Is in .V plays The expected total of the values selected 
"ious types of cards 1n ^ Pie 5, А ae wehbe 
from the ч la would be Pi and the expected grand total of the values 
» j-elass | 


‚ given by 
Selected from all classes would be given э: 


In this scheme Х; 
analogous to the value of any se 


125 


AVERAGES: INDEXES OF LOCATION 


[4 [3 
5 Np;X;= n> р;Х; [see (3.19)] 
j=l j=1 
(Compare with the total expected amount to be received by the Drawer in 
N plays.) Now we shall define the expected value, /(Y), of the X selected 
on a single trial or draw as the quotient of this expected long-run grand total 
divided by the number of trials, N, that is, 
E(X) = Y pjXj (5.14) 
g=1 
Having defined the expected value of X, or E(X), by formula (5.14), 
we are now ready to establish a result analogous to that which we derived 
from our card game. That is, we shall show that the expected value of a 
score drawn at random is the mean of the scores, just as the expected value 
of a card drawn at random was the mean value of the cards. 


Rute 5.7 The expected value (mathematical expectation) of a score 
selected by some chance (random) procedure from a score distribution is the 
mean of the distribution. Or symbolically, 


Е()= (5.15) 
Proof. 


р; = д so that f; = Np; 


Hence, 
x ] se 
X= N > Nei; [substituting for f; in (1)] 
= 
NOE, ur. 
TUN >, PiX; [by (3.19)] 
i= 
= و‎ E(X) [by (5.14)] 
= 


As a by-product of this proof we have the following rule: 


Кот 5.8 The mean of a distribution presented in terms of relative 
frequencies is given by 


X= У рх; (5.16) 


j=1 


126 


AVERAGES: INDEXES OF LOCATION 


The student should not infer from the character of the example used 
(the hypothetical gambling game) that mathematical expectation is а 
theoretical concept of no practical importance save, perhaps, to gamblers. 
The concept of mathematical expectation gives us a criterion for evaluating 
a single outcome of the type of event which has a number of possible out- 
comes—provided, of course, that we know from long-run experience how 
frequently each of these different possible outcomes tends to occur in prac- 
tice. Thus, if for each of a large collection of a certain type of TV picture 
tube the number of hours of useful life is known, the concept of mathe- 
matical expectation can be applied to determine the expected life of a single 
picture tube of this type; such information would be of some importance 
in establishing a period of guarantee. The concept is employed in all insur- 
In the case of automobile collision insurance, for example, the 


ance plans. 
expected cost of damages incurred by a single individual over a period of 
one year can be derived from the long-run experience of many drivers and 
used as а basis for establishing insurance rates. Whenever, then, the 
expected value of an event is required, the mean value of a large number of 
outeomes of this event is the appropriate index to employ. 


"TION OF AN AVERAGE: SUMMARY 


5.14 SELE 
Шей attention to the necessity in statistical work 
are consistent with the purpose for which the 
Work is being done, as well as appropriate for the type of data involved. 
As the foregoing sections indicate, the selection of an average permits no 
exception to this basic principle. Moreover, we have not attempted in these 
sections to catalogue completely the various purposes to which AN GUISOS 
may be applied or the various types of data which may be involved. It is 
hoped, however, that the variety of [ j | red i 
sufficient to convince the student that there is no single average which is 


best. for all purposes and all types of data, and to impress upon him the 
s attention to purpose and to the nature of the 


d in Table 5.7 is limited to the purposes and 
four foregoing sections. 


We have previously са 
of selecting procedures that 


purposes and situations considered is 


necessity of constant, careful 
data. The summary presented m 
types of data specifically treated in the 


5.15 JOINT USE OF AVERAGES 


be expected, а particular statistical analysis may 
be carried out with more than one purpose in view. If these purposes con- 
fliet insofar as the selection of an average 1S concerned, the only sensible 
Р M Эу RA K i B |^ i 
Way to resolve the conflict is to use the average appropriate to сас h purpose, 


than one average. | | 5i 
that the mean and median considered jointly 
i B 


mmetry of a distribution. In Section 


As would obviously 


that is, to use more 
It should also be noted 
Contain information regarding the asy 


127 


AVERAGES: INDEXES OF LOCATION 


TABLE 5 7 Summary of Conclusions of Sections 3.10, 5.11, 
5.12, and 5.13 on the Selection of an Average 


PURPOSE Nature oF Data APPROPRIATE AVERAGE 
To Rep- | Unimodal and symmetrical | Choice immaterial since X= Мап= Мо* 
resent Multimodal and symmet- | Modes if multi-valued index usable, 
Typical rical otherwise either Mdn or X, since 
Score Мап = X. 
Value Unimodal and skewed Мап. 

Multimodal and skewed Modes if multi-valued index usable, 


otherwise Mdn. 


Interest 
in 
Aggregate | All types Mean 
of All 
Score 
Values 


To Rep- 
resent All types Mean 
Expected 
Value 


*Unless sampling from a population is involved. In this 
"e developed in later chapters, it is usually best to use the 


for reasons which wil 
nean. 


e 


5.10 (see particularly Figure 5.5) we pointed out that while in symmetries 
distributions the values of the mean and median were the sume, in asym- 
metrical or skewed distributions the values of these averages differed due 
to the greater sensitivity of the mean to the extreme score values present 
in such distributions, It was observed that in distributions skewed to the 
right, the value of the mean exceeds that of the median, while the reverse 
is true in the case of distributions skewed to the left. Because the median 
and mean behave in this manner 
tion of the direction in which 


‚а comparison of them provides an indica- 
a distribution is skewed, 


5.10 GROUPING ERROR IN THE MEDIAN 


As with the mean and the mode (see Sections 
of the median derived from a grouped frequeney distribution may not agree 
exactly with that of the median of the original values of the collection of 
scores involved. In the case of the mean, this grouping error was attributed 
present accurately the original 
intervals. In the case of the mode, it 

arbitrary placement. of the intervals 


5.5 and 5.6), the value 


to the failure of the interval midpoints to re 
values of the scores classified in the 
was attributed to the fact that the 


128 


AVERAC 


INDEXES OF LOCATION 


along the scale may result in a discrepancy between the midpoint of the 
modal interval and the most frequently occurring of the original score 
values. In the case of the median, however, this grouping error in general 
is due to the failure of the original score values to be evenly distributed 
throughout the interval containing the median, for in computing the value 
of the median (or, for that matter, of any percentile) from a grouped fre- 
quency distribution, it is assumed that the scores are thus distributed 
throughout the interval in which it falls. We shall consider here, in a general 
way, the effect of the failure of this condition to be satisfied upon the gross 
character of the grouping error present in the value of the median. 

First, however, it should be observed that situations do exist. which 
coustitute exceptions in the sense that the median derived from the grouped 
data may be free of grouping error, in spite of the fact that the original 
scores are not evenly distributed throughout the particular interval or 
One such exception occurs when, in the case of a grouped 
distribution involving an even number of scores, the intervals are so placed 
that one-half the scores fall below the upper real limit of one of the intervals. 
In this situation, the median is actually not in an interval, but is at the 
boundary point between two intervals so that the exact nature sr Ha 
distribution of the original scores within these two intervals is immaterial. 

A second exception occurs when the grouped frequency кы 
of an underlying symmetrical score distribution involves ап odd ne | | 
intervals which are so placed along the scale that the midpoint oft е middle 
interval coincides with the central score value of the нло 
When this occurs, the proportion of scores below the lower rea eal this 
interval is the same as the proportion of scores above pen a) Am imit, 
the scores. within this interval to be evenly distributed 
jan at its midpoint. But this point in 
this situation coincides with the central score value of p ү пы 
tion, and hence is free from grouping error, 1! пе = a и | 
Original seore values may not actually be Дын ues iade dps xi 
this middle interval (sce Figure 2.4D and eon ray ae ien si ) are 
reorganized into groups of three, starting with t Ж D ed NY 

sera. -coptional situations described depend for their 
А Now both of rate en the intervals along the score seale. In 
rue p b te then, grouping A maT ш ed ans ee 

: ment of the classes. Since the placement of the 
depending upon the plac ement О 


intervals involved. 


and, assuming 
throughout, we would place the med 


SS 7 | д И 

ame as > value ol 1e poin 
oi ay not be the same as he v . 
pi int may values, but this diserepaney is not, 


-— РА 
The value of this boundary val score 


г between the two middle SECTOR Any point between these two original 
Aki atter of grouping vary point involved is пес rily among 

sian speaking, a matter anid the boundary point uals 1 k Кр К ш ng 

Score vg PR is 4 adian vit E “i i is а edian о ^ or ial seor 

the s MUN isa vag eed the boundary point is a m g 

." Points so situated. t 


ingle value which we are arbitrarily 
MALO. ‘+ may differ from the sing b 
“stribution, even though it тау aii 
, eve 


äceustomed to using. 


129 


AVERAGES: INDEXES OF LOCATION 


classes is determined by an arbitrary rule-of-thumb procedure, such group- 
ing error as may occur in these situations іх just as likely to be positive 
(Mdn, greater than Mdn,) as negative (Mdn, less than Mdn,. Hence, 
insofar as these special situations occur, the grouping error in the median 
is not systematic in direction. It should also be observed that while these 
special situations could conceivably arise in the case of asymmetrical dis- 
tributions, they always occur in those cases in which the original score 
distributions are symmetrical. Hence, we can conclude that such grouping 
error as may be present in medians is not systematic when those medians are 
computed from grouped frequency distributions prepared from symmetrical 
original score distributions. 

We shall next consider the character of the grouping error in medians 
computed from grouped distributions prepared from original score distribu- 


5 10 15 
Mdn,—6.3 

Histogram A Histogram B 

X f Xx f Ht(f/c) 
16 1 15-17 2 -67 
15 1 12-14 7 2.33 
14 2 9-11 15 5.00 
13 2 6-8 42 14.00 
12 3 3-5 34 11.33 
11 4 100 
10 5 

9 6 Mdng = 6.6 

8 8 E=Mdng—Mdno 

7 14 —6.6— 6.3 = +0.3 

6 20 

5 26 

4 8 

100 
Mdno=6.3 


Ficurs 5.6 Histograms and frequency distributions of a hypothetical 
collection of 100 test scores which are skewed to the right | 


130 


AVERAGES: INDEXES OF LOCATION 


tions that are skewed. Histogram A of Figure 5.6 pictures the distribution 
of the original values of a hypothetical collection of 100 test scores. The 
distribution of original score values is clearly positively skewed and is 
shown in the frequency table directly below this histogram. The median 
of this distribution is 6.3. Histogram B of Figure 5.6 presents a grouped 
frequency distribution for these same data, with intervals of size 3.* The 
heights of the rectangles comprising this latter histogram have been made 
one-third of the class frequencies so that the areas of these rectangles would 
equal their class frequencies. By this device, the area of the rectangle 
representing the frequency of a given class in Histogram B is made the same 
as the sum of the areas of the rectangles of Histogram A, which represent 
the frequencies of the score values included in this class. The latter rect- 
angles have been superimposed on Histogram B in the case of the class 
containing the median (see dotted lines in Histogram B). 

Now it will be observed that the assumption of an even distribution 
of scores among the score values included in the class or interval 6-8 (ie., 
the median interval) is not well satisfied owing to the fact that in situations 
of this type more of the original values fall into that portion of the interval 
Which is nearer the mode of the whole distribution. Since interpercentile 
distances are least at those portions of the scale where frequencies. are 
greatest (see Section 4.12). it follows that in positively skewed distributions, 
the value of the median derived from grouped data will tend to be greater than 
that derived from the original scores. This situation is reversed in the "ase of 
distributions skewed to the left (see Figure 5.7 which presents distributions 
that are the mirror image of those shown in Figure 5.6) so that in negatively 
skewed distributions, the value of the median derived from grouped data tends 
to be smaller than that derived from the original scores. 


IMUM INFORMATION NEEDED To DETERMINE THE 
or A GinovPED FREQUENCY DISTRIBUTION 

Section 5.4), attention was called to the fact 
considered, only the mean was dependent 
upon the exact value of the individual scores n the epis To appre- 
ciate the extent to which the mean and median differ in this respect, it 
Will be helpful to note what 2 small amount of information about a grouped 


frequeney distribution is essential to the determination of its median. Only 
four basi f ts or piec are required. These may be pre- 
" basic facts or 


5.17 Міх 
MEDIAN 


In defining the mean (see 
that of the three averages here 


es of information 


NN Жл ee E 
3 should not be employed when the range is only 12 
18 be used for computational purposes. This coarse 


* a 
Of course, intervals as big as 
nvenience of illustration only. 


and the resulting distribution is 
Erouping was employed here for co 
See Sections 2.3 and 2.7. 


131 


AVERAGES! INDEXES OF LOCATION 


5 10 15 


Mdno=13.7 Мап, =13.4 
Histogram А Histogram В 
x f X f Ht(f/c) 
16 8 15-17 34 11.33 
15 26 12-14 42 14.00 
14 20 9-11 15 5.00 
13 14 6-8 7 2.33 
12 8 3-5 2 67 
1 6 100 
10 5 
9° 4 Mdng =13.4 
8 3 E=Mdng—Mdno 
7 2 =13.4— 13.7 = —0.3 
6 2 
5 1 
4 1 
100 
Mdng=137 


Figure 5.7 Histograms and frequency distributions of a hypothetical 
collection of 100 test scores which are skewed to the left 


sented in a variety of ways, but in the final analysis they may be reduced 
to the following: 


1) N, the total number of scores in the collection. 
2) fso, the frequency of the interval containing the median. 
3) Uso and Lso, the upper and lower real limits of this interval. 
4) cfr, the cumulative frequency up to this interval. 
Then, 


ON — ef, 


Мап = Ly + КЕШЕ cu (Uso = Lao) (5.17) 


For example, consider the grouped frequency distribution of Histo- 
gram B of Figure 5.6. Herc, 


132 


AVERAGE 


INDEXES OF LOCATION 


v= i бы = 34 


Ло = 42 Uso, Lso = 8.5, 5.5 
Hence, for this distribution 
Mdn=5.5+ 30724 (8.5 — 5.5) 
HOT o uet 
=55+10Х838=5.5+7 
=5.5 +11 


Formula (5.17) simply expresses in terms of algebraic or mathematical 
symbolism the steps to be taken in computing the median or 50th percentile 
(Pso) as explained or developed for the computation of any percentile 
point Р, in Section 4.7 (see Examples 4.6 and 4.7). Hence, (5.17) is seen 


to be simply a special case of the more general formula, 


Р.= byte І.) (5.18) 


which may Бе used to compute any percentile point, Ру. 


133 


AVERAGES: INDEXES OF LOCATION 


MEASURES OF VARIABILITY 


6.1 INTRODUCTION 


It should be readily apparent that considered alone, an average—that 
is, a measure of central tendency or group location—can describe only one 
of the important characteristics of a distribution of scores. It is often 
equally essential to know how compactly the scores are distributed about 
this point of location or, conversely, how far they are scattered away from 
it. For example, in describing the distribution of intelligence for a given 
class of pupils, it would be insufficient to know only the average IQ of the 
class. For instructional purposes it is equally if not more important to 
know how large are individual differences in intelligence. In other words, 
we should like to know whether the class is made up exclusively of students 
of average and near-average intelligence or contains a large proportion of 
extremely bright and extremely dull pupils. This characteristic of a dis- 
tribution of scores is variously referred to as dispersion, spread, scatter, 
deviation, and variability. In this chapter we shall define and discuss several 
quantitative indexes of variation. First, however, we shall suggest two 
very important uses to which such indexes may be put. 

Apart from providing simply a quantitative index of the degree of 
variation among the scores of a particular collection, and the obvious 
necessity of such an index if we wish to compare the degree of variation in 
two or more collections, a very important application of variability arises 
in connection with the study of the accuracy of certain measuring or 
estimating procedures. Consider the problem of measuring the amount 


134 


MEASURES OF VARIABILITY 


of some continuous trait possessed by some indiviaual or object. We have 
explained in Section 2.2 how it is impossible to measure the trie amount 
of a continuous trait that is possessed by a given object and that all such 
measurements are, therefore, approximate. This being the case, it is, of 
course, impossible to study errors of measurement by the obvious device 
of comparing obtained (measured) and true amounts An analogous situa- 
tion arises when it is desired to estimate some population characteristic by 
studying a sample taken from the population. For example, suppose it is 
desired to estimate the mean IQ for all children in the United States of age 
three through age fifteen by obtaining as the estimate the mean of a sample 
of children taken from this population. Since the determination of the IQ 
of all the children in the population is a practical impossibility, the true 
or population mean can never be known, and again it is impossible to in- 
vestigate error by comparing the obtained and true values. How then, in 
any situation involving the approximation of an ever remaining unknown 
true value, can error be investigated? 

One possible method of attack consists in making a number of inde- 
the measurement or estimating procedure. Then, 
at the procedure does not give rise to systematie 
error (i.c., is free from bias), the variation in the values thus obtained pro- 
vides a basis for assessing the accuracy of the procedure. If the values 
arising from a number of repetitions are in close agreement, then the 
procedure may be regarded as an accurate one. On the other hand, if the 
values differ markedly there can be little confidence in its accuracy. Obvi- 
ously then, some index of variability applied to a collection of values result- 
ing from a number of independent repetitions of a measuring or estimating 
procedure provides, in turn, a quantitative index of the accuracy of the 
procedure. Comparison of such indexes for different measuring or estimat- 
ing procedures provides а basis for evaluating their relative accuracy. As 
Will be seen in later chapters dealing with sampling-error theory, this 
riability to the results obtained from a number 
sampling procedure is a most important 


pendent repetitions of 
if it can be assumed th 


application of an index of va 
of independent repetitions of the 
essential to the usefulness of the theory. | "HM | 

A second important application of indexes of variability is as the basic 
unit in a derived or new measuring scale. In Section 4.1, it was pointed 
out that scores yielded by most psychological and educational tests had 
little, if any, absolute significance and that such scores were consequently 
useful only in describing an individual's relative status within a given group. 
Chapter 4 treated in detail one device, the percentile rank, for attaching 
Another device, the standard score, involves the 
as a unit in a new sí 


usc of an index of variability cale. Since this device is 
treated in some detail in the next chapter no attempt to develop it will be 
made at this point. Instead we shall turn attention directly to some of the 


quantitative indexes of variability which are in common use. 


meaning to such scores. 


А 135 
MEASURES OF ҮАШАНЫТЇ 


6.2 THE RANGE TYPE or INDEX 


In a general way it should be observed that while any quantitative 
index of location (i.e., any average) is necessarily a point on the score scale, 
any meaningful index of variability must be a distance along the score scale. 
This distance will be small or large as the variability in the score values is 


small or large. А distance sometimes used as an index of variability is that 
from the smallest to the largest score in the collection. If the scores of a 
collection are compactly or homogeneously distributed -that is to SHY, 
much alike in magnitude —then the distance from the smallest to the largest 
score will be much less than the corresponding distance for a collection of 
scores which differ markedly in magnitude. This distance is known as the 
range. If the smallest score of a collection is represented by S and the 
largest score by L, then the range, R, is defined by 


R-L—S (6.1) 


While this index has the advantage of great simplicity, it is weak in 
the sense that it ignores or fails to take into account any of the distances 
between scores except that between the smallest and largest. Between 
these extreme scores almost anything could be true of the distribution, that 
is, all the other scores may or may not be very compactly distributed. 

This weakness may be lessened to some extent by the use of the distance 
(i.c., range) between some pair of score values other than the smallest and 
largest. Two such ranges were suggested in Section 4.13. These were the 
range (distance) from Q; (the first quartile) to Q; (the third quartile) and 
the range from D, (the first decile) to Dy (the ninth decile), While we are 
less likely to be misled by these ranges than by the one defined in (6.1), the 
fact remains that they still fail to take into account much of the total 
available information regarding variability. 

For a reason which will be mentioned in the following section, one-half 
the range from ©з to Qs is sometimes used as an index of variability. This 
index, known as the semi-interquartile range (Q), ix defined by 


Q- 99 (6.2) 


It should be obvious that insofar as utilization of available information 
on variability ix concerned, Q is no better than Qs — Qi. In the next section 
we shall consider a different approach to the 


сы ө ET problem of devising indexes 
of variability. This approach will not be & 


) ха s simple as that yielded by the 
use of ranges, but it is capable of providing indexes that make more com- 
plete use of the available information on variability. 


٤ Ble i Ordinarily, range 
indexes are useful only in situations in whic 


‘ha rather crude indication of 


variability is sufficient for the purpose of the particular analysis. 


136 


MEASURES OF VARIABILITY 


6.3 Tne DEVIATION TYPE or INDEX 


istance value which is indicative of variability is the 
n score values from some central point 


that we determine the mean of the dis- 
are respectively 


Another type of d 
average of the distances of certai 
(average). Suppose, for example, 
tances of Qı and Qs from the median. These distances 

Mdn—Qi 


and 
Qs = Мап 


Now adding these distances and dividing by two to find their mean 
we obtain 


2 


(Qs = Mdn) + (Mdn — QU E Qi—Qi 
2 


uartile range defined in (6.2) is simply 
from the median (actually any point 
as the median). 

average-distance approach in 
usefulness or meaning- 


Thus we see that the semi-interq 
the mean of the distances of Qs and Qi 
between Qi and Qs would serve as well 

The mere change from а range to an 


annot in any Way alter the | 
f variability. In applying the distance approach, 


t the number of score values involved to 
Q. There is no reason, in fact, why the 
к involve all the score values and 
t the information on variability 


deriving Q, obviously, 
fulness of Q as an index О 
however, there is no need to limi 
two, as was done in the case of 
average distance could not be made to 

thereby. take into more complete accoun 
Contained in the data. 

ple example 
igure 6.1. 


ams of two hypothetical score 


> histogr De: 
the of Distribution A are 


By way of a sim] The scores 


distributions are shown in F 


А 31413 
0 Lu B 
15 5 


20 ё Distribution В 


Distribution A 


— M ок с 
س‎ ю wR 


nd compact (B) hypothe- 


Figure 6.1 Variable (A) a 
ibulions 


tical score distri 
oi ble) than those of B. Both 


arbitrarily use this value аза 
ances of the scores. For 


е vari 
Clear А ленай (more V* 
learly more widely disperse’ ( We shall 


tributions have means © = asure the dist Я 
Central] point from which to meas score; 15. is five units away from 20, 


a s -est S 
example, in Distribution А the low 


137 


MEASURES OF VARIABILITY 


and the next two scores, both 17, аге each three units from 20. Beginning 
with the lowest score, the distances from 20 of the ten scores in Distribution 
A are respectively 5, 3, 3, 1, 1, 1, 1, 3, 6; and 6. The sum of these ten 
distances is 30 so that the mean distance is 3. Similarly, the distances from 
20 of the ten scores of Distribution B are 1, 1, 1. 0.0.0.0. T. 1. d "Dh 
total of these distances is 6 so that for Distribution B the mean distance is 
only 0.6, a value one-fifth as large as that obtained for the more variable 
Distribution A. 

In the terminology of statisties the distance of a score from a central 
point is called a deviation, and the index of variability just described is, 
therefore, known as the mean deviation. Symbolically, the mean deviation 
may be defined as follows: 


SX 
MD= Te (шз 


-Xp (6.3) 


where N — the number of scores, 
any score value, and 


= the mean of the collection 


The vertical bars in (6.3) indicate that only the absolute values of the 
deviations are involved. That is, the direction of the score value, Ау, from 
the mean, X, is ignored. This direction is, of course, indicated by the sign 
of the X; — X difference. In Distribution B of Figure 6.1, for example, 
the deviation of a score of 19 from the mean, 20, is 19 — 20 or — 1, while 
the deviation of a score of 21 is 21 — 20 or + 1. The vertical bars indicate 
that only the numerical or absolute values of such deviations are to be 
considered. Reference to (5.10) should reveal at once why it was necessary 
to ignore the signs of the deviations in defining the mean deviation, for had 
the sign been retained, their sum, and hence their mean, would always be 
zero regardless of the variability of the scores involved. 

A notational practice first introduced in Section 5.8 is again used in 
(6.3). Since this practice, which ix widespread, will be used throughout this 
book, it is important that the student have it thoroughly in mind. The 
practice referred to is that of representing any score in a collection by an 
upper-case letter and its deviation from the mean of the collection by the 
corresponding lower-case letter. 

In the mean deviation we have an ind 
takes more thoroughly into account the information on variability con- 
tained in the data than does any range type of index. Indeed, if our sole 
purpose in determining an index of variability were simply to describe the 
extent to which the scores of a collection are" dispersed or scattered along 
the score scale, we would look no further. Unfortunately, however, the 
mean deviation, due to the involvement of absolute values, has proven to 
be most stubborn, if not unmanageable, in the development, of more com- 
plicated statistical theory. This is particularly true of sampling-error 


ex of variability which clearly 


138 


MEASURES OF VARIABILITY 


theory to which we referred in the first section of this chapter, and also of 
correlation theory. Both sampling-error and correlation theory are needed 
at a rather carly stage in the study of statistics and are dealt with at some 
length in subsequent chapters of this text. It is essential, therefore, that 
we introduce at this point an index of variability which is free from the 
involvement of absolute values, and hence more tractable in the develop- 
ment of statistical theory. 

We have scen that the mean of the signed (X;— X) deviations must 
necessarily be zero and is consequently useless as an index of variability 
Inasmuch as the product of two negative numbers is a positive number, this 
as an index of variability the mean 


difficulty ean be cireumvented by using 
he squaring of each deviation is 


of the squares of these deviations. While t 
an added complication it is obvious that such a mean-square deviation is 
fully as sensitive to changes in variation as the mean deviation itself. 
Consider, for example, the A and B distributions of Figure 6.1. Beginning 
with the smallest score (1.е., 15) the deviations from the mean, 20, of the 
ten scores of the A distribution are — 33. = 1 ja à, 
+6, and + 6 respectively. The squares of these deviations are + 25, + 9, 
+O 1414141, +9, + 36, and + 36. The sum of these squares is 
128 and consequently their mean is I: In the ease of the B distribution, 
on the other hand, the deviations are — 1. — 1,—1,0,0,0,0,+1,+1,+ 1. 


The squares are + 1, +1, 353, 0,0; 0, 0, JT 1, +1, and the mean of 
twentieth of that obtained in the 


0.6, a value less than one- 
variable A distribution. 

just described is known as the variance. It is repre- 
Among the more common are V, s?, and 
o”. In the study of sampling theory, some of the results may be somewhat 
more simply stated if the variance of a sample is defined as the sum of the 
squares of ‘the deviations divided by one less than their number (i.e, by 
N — 1), rather than as the mean of the squared deviations. For this reason 
ave elected to define variance as the sum of the squared 
These writers have commonly adopted the 
symbol s? to represent variance thus defined. For reasons of personal 
pedagogical preference we shall not define variance in this vay To mini- 
mize the possible confusion а student may experience when referring to 
sources in which variance is thus differently defined we shall refrain from 
the use of the symbol s? in this book. Instead we shall employ the German 
letter "ess" (82) except in situations, to occur later, in which a need for 
distinetion between sample and population variance arises: In such situa- 
present the population variance by c?. A symbolic state- 


these squares is 
case of the much more 

The index we have 
sented by a variety of symbols. 


many writers h 
deviations divided by № — 1. 


tions we shall rc 


ment of the definition of variance follows. 
Sr;2 ^ = 
gh M (к= X;— X) (6.4) 


139 


MEASURES OF VARIABILITY 


A disadvantage of the variance in certain applications is the fact that 
it is not a value in units of the original score scale. For example, if the 
original measures are in units of inches, then the squaring of the deviat ions 
produces a series of numbers representing square inches, and the variance, 
which is the mean of these numbers, is, therefore, also a value expressed in 
terms of square inches. In general, the variance is expressed in units which 
are the squares of those of the scores involved and consequently, unlike 


the other measures of variability considered, it cannot be interpreted as a 
distance along the score scale. This is a characteristic of the variance, 
however, which is easily modified. To return the index to the original scale 
it is only necessary to extract its square root. The resulting index of 
variability which is amenable to interpretation as a distance along the 
original score scale is known as the standard deviation. Symbolically, its 
definition may be written 


The standard deviation of Distribution A of Figure 6.1 is simply the 
square root of its variance 12.8, or 3.58. The standard deviation of Distri- 
bution B is the square root of 0.6, or 0.77, a value between one-fourth and 
one-fifth as large as that of the more variable Distribution A. 

The standard deviation is by far the most important and most widely 
used index of variability. It makes complete use of the information on 
variability contained in the data, and is quite manageable mathematically 

a characteristic of great importance in the development of statistical 
theory. The cost of these advantages is, primarily, a loss in simplicity. 
Unless, however, the situation is such that a crude assessment of variability 
will suffice, the standard deviation should be used in preference to the 
various types of range indexes. The mathematical intractability of the 
mean deviation has led to its virtual abandonment as an index of variability. 
In fact, it has been discussed here only because it provides a logical approach 
to the presentation of the standard deviation. 


6.4 COMPUTATION OF VARIANCE AND STANDARD DEVIATION 


To illustrate the computation of the variance and standard deviation 
of a set of scores we shall use the 50 scores made by 50 subjects on a 25- 
word anticipation test which were reported in Table 5.1. 
these 50 scores is 11.08 (see Section 5.5). 
set of scores we may follow dir 


The mean of 
To compute the variance of this 
tly the instructions of the symbolic state- 
ment of the definition given in (6.4). That is, 


*It is suggested that in his own writing the student use instead of this € 


1 1 erman ess the 
more easily written lower-case script ess. 


140 


MEASURES OF VARIABILITY 


$2 = 2 
20 


Noo (18— 11.08)? + (10 — 11.08)? + +--+ (16— 11.08)? 
NUS 
(6.92)? + (— 1.08)? + - + - + (4.92)? 
50 


_ 47.8864 + 1.1664 + - - - + 24.2064 _ 589.6800 
x 50 $0 


= 11.7936 


The standard deviation of this distribution is, therefore, 


s = У11.7936 = 3.43 
It is apparent that the direct computation (i.e., computation according 
to the definition) of the variance is a tedious task. When the mean involves 
a decimal fraction the deviations are not only awkward to obtain but are 
* Fortunately it is possible to obtain the sum of the 


troublesome to square. 
eir mean without actually 


squares of the deviations of the scores from th 
finding the deviations. 


We shall present the rule for obtaining the needed sum of squares both 


verbally and symbolically. Then we shall verify it in the case of a specific 
example. Finally we shall provide a general proof. This rule is among the 
most useful of all elementary statistical rules. While the non-mathematical 
student may wish to omit. study of its proof, it is essential that all students 
understand the statement of this rule and master its application. 


The sum of the squares of the deviations of the scores in a 
е collection is given by the difference between the 
of the sum of the scores divided 


Rvrz 6.1. 
collection from the mean of th 
sum of the squares of the scores and the square 
by their number. Or symbolically, 

DL 
(ZX i) (x; = X = X) (6.6) 


Dr у 


It is important to understand the difference between IX; and (ZX j>. 
The first of these expressions represents the quantity obtained when each 
of the N scores is first squared and then these squares summed. The second 
represents the quantity obtained when the N scores are first summed and 
then the resulting sum squared. The distinetion between these two expres- 

beginning st udents. Careful eonsideration 


sions is misunderstood by many : ù iu e 
of the following example may be helpful in overcoming this difficulty. 


Consider the following 10 scores (here N — 10): 


Example 1. 
12, 7, 13, 13, 5, 2, 8, 5, 5, 10 


Se ee e Е 


*Of course, the use of & calculator or, 
of a table of squares (see Table I, Appen 


in the absence of such equipment, even the use 
dix C) will greatly reduce the labor involved. 


141 


MEASURES ОР VARIABILITY 


Applying the rule we have 


D 7 Kes 10)2 
Жаз = (12924 (7)3 +... (10) — GFE E 10) 


To verify this result we must first determine the mean, X, of these 10 
scores. This mean is 


Х=ту=8 


Next we determine the deviation from the mean (i.e., x; = X;— 8) of each 
score. These deviations are 4, — 1, 5, 5, — 3, — 6, 0, — 3, — 3, and 2. The 
squares of these deviations are 16, 1, 25, 25, 9, 36, 0, 9, 9, and 4. Hence, 


Z22,— 164-1 feet 4= 134 
which is the quantity previously obtained by application of the rule. 
Proof. Given a collection of N scores, Xi, Хо, + +, Xy. Let X; repre- 


sent the value of any score in this collection [see (3.1) or (3.2)] and let x; 
represent its deviation from the mean X. That is, 
== X; = X 
Then 
z?;— (X; — X)2 
= X?,4 X:—2XX, 


Now summing all N such squares we obtain 


EX2,— EX?S2pRNX?—9XZX, (1) 
[see (3.19), (3.20), and 
But by definition, (5.1), (3.21)] 
vy. ХХ; 
AX = DE (2) 
Hence, 
ў2_ CX 
os е 


Or if we multiply both members of this equality by N 


= )2 
МХ? ex) (3) 


Now substituting from (2) and (3) into (1) we have 
Lair! 8XS ny 


142 


MEASURES OF VARIABILITY 


And upon combining terms we obtain 


yas 

Dr? = DX2;- E”, 
which proves the rule. 

We shall now show how this rule can be used to facilitate the computa- 

tion of the variance (or standard deviation). If we substitute from (6.6) 


into (6.4) we have 


хх: (DXi? 
$2 = N ug (6.7) 
or 
ےی‎ ys (6.8) 


Thus. to obtain $? it is no longer necessary to carry out the tedious 
process of determining the square of the amount by which each score 
deviates from the mean. Instead we need simply (1) square each score, 
(2) find the mean of these squares, and (3) subtract from it the square of 
the mean of the scores. Applying this procedure to the data of Table 5.1 
we have 


_ (82+ (I0? E + + (16)? (11.08)? 


82 


Ж 50 
2: 6128 — (11.08)? = 134.56 — 122.7664 
D 


= 11.7936 


which is identical with the result obtained earlier. | к» 
If for some reason the data are to be organized into a unit-interval 
frequeney distribution, it is usually more convenient to defer the computa- 
tion of the variance until the frequency distribution is prepared. We have 
sidered the computation of the mean of data organized into 
such a frequency distribution (see Section 5.5). To illust rate the procedure 
as it applies to the computation of the variance, the data of Table 5.1 have 
been organized into a frequency distribution involving unit intervals. This 
distribution is shown in Table 6.1. To find the ZX?; called for by either 
(6.7) or (6.8) we first obtain the X? subtotal for each class just as we ob- 
tained the X subtotal for cach class in computing X. For example, the X? 
subtotal for the class 15 is 675 since гезате d à 
(f= 3), and the sum (15)? + 03)? + (15)? is 673. It is, of course, more 
efficient to use multiplication instead of addition to obtain the class sub- 
totals, that is, simply to find the product of the class frequency (f) and the 
square of the class value (X?). In the case of the class 15, for example, we 
have 3 X (15)? or 3 X 225 = 675. Obviously, the grand total of these sub- 
ired DX? Table 6.1 shows these subtotals 


totals for all elasses is the requi NS А ü 
in the cohmin headed XZ The £X column of this same table gives the X 


already cot 


there are three scores in this class 


143 


MEASURES OF VARIA BILITY 


X( Score) f fX ne 

21 1 21 

20 0 0 

TABLE 6.1 19 0 0 
18 2 36 

17 1 17 

Unit-Interval Frequency 16 2 32 
Distribution of 50 Scores 15 3 45 
Given in Table 5.1 14 2 ES 
13 S 39 

12 4 48 

11 6 66 

10 9 90 

9 7 63 

8 5 40 

7 2 14 

6 1 6 

5 1 5 

4 1 4 

50 554 6728 


subtotal for each class, and the grand total for this column is the EX; 
required in computing the mean. Thus we have in the grand totals for the 
JX and fX? columns of Table 6.1 all the information needed to apply (0.7) 
or (6.8). For example, applying (6.7) we have 
2908 _ Gr = 134.56 — 122.7664 = 11.7936 

50 
as before. 

If we use the symbolic scheme for representing a frequency distribu- 
tion described in Section 3.4 the total of the N scores involved is as given 
in (3.13), i.e., Ef; Xj, and the total of the squares of these № scores іх as 
given in (3.16), ie., Xf X2, Adapting this notation to formulas (6.7) and 
(6.8) we obtain the following computational formulas for the variance; 
these formulas are directly applicable to data organized into a frequency 


distribution. 
Sey SEYA 
82 = 2142 > (220) (6.9) 
ے دو‎ А үз (6.10) 


The application of formula (6.10) to our example is shown below: 
ea (02D? 4 (0)(20)2 + (0)019)2+...4 (1 (4)? 


— (11.08)2 
6728 ; 
= 50 — (11.08)? = 134.56 — 122.7664 = 11.7936 


144 


MEASURES OF VARIA BILITY 


If the time spent in organizing the data into a frequency distribution 
is counted as part of the time spent in calculating the variance, it is doubt- 
ful if the use of (6.9) or (6.10) is much more efficient than the use of (6.7) 
or (6.8). If, however, the frequency distribution is to be prepared anyway 
for some other purpose, it is more efficient to employ (6.9) or (6.10). 

It is also possible to use (6.9) or (6.10) with a grouped frequency distri- 
bution, that is, with a frequency distribution the classes of which span 
more than one unit. In this case, however, the variance will be only an 
approximation of that obtained by (6.7) or (6.8), that is, of the variance of 
the original ungrouped scores. As has been previously suggested (see Sec- 
tions 2.5 and 3.5), the approximate character of a variance obtained by 
(6.9) or (6.10) applied to a grouped frequency distribution is due to the 
failure of the interval midpoints to represent with complete accuracy the 
values of the scores falling in the intervals. However, if the procedure 
suggested in Section 2.5 for selecting classes is appropriately followed. 
variances may be computed from grouped frequency distributions with a 
degree of accuracy that is usually sufficient for most practical purposes. 
The use of (6.9) to compute the variance of the 50 anticipation test scores 
(see Table 6.1) organized into a grouped frequency distribution is illustrated 
in Table 6.2.* 


TABLE 6.2 Grouped Frequency Distribution of 50 Scores Given 
j in Table 5.1 and Computation of 8? Using (6.9) 
CLASSES x $ JX /Х? 

21-23 22 1 22 484 
18-20 19 2 38 ї 22 
15-17 16 6 96 Taon 
12-14 13 9 117 22 
9-11 10 22 220 2200 
6- 8 7 8 56 392 
3- 5 4 2 bi 
50 
2م‎ = SRI (982)? = 137.74 — 124.0996 
= 13.6404 


As has been previously stated, inaccuracies arising from the use of 
grouped data are known as grouping errors. In the case of the variance and 
standard deviation, grouping error is respectively defined as 


actually too coarse for use with these data if the distribution 
Й jses (see Section 2.5). It should be clearly under- 
is for convenienge of illustration only. 


*An interval of size 3 is ac 
is to be used for computational purposes (s 
stood that our use of an interval of this size 


145 


MEASURES OF VARIABILITY 


(6.11) 
and 


Ез = Sy — Фо (6.12) 


where the g-subscript indicates the value derived from the grouped data 
and the o-subscript indicates the value derived from the original unordered 
scores or from a unit-interval frequency distribution. 

In Section 5.5 the nature of grouping error in the case of the mean was 
considered at some length. We shall not discuss the problem as it applies 
to the variance in as much detail. However, it may be observed that unlike 
the grouping error associated with the mean of a symmetrical distribution, 
the error associated with the variance is systematic. In Section 5.5 it was 
pointed out that in using a grouped distribution to compute the mean of 
data that are basically continuous and unimodal, the values of the mid- 
points of the classes lying above the mode are too high to represent ac- 
curately the scores falling in these class It was further noted, however, 
that the midpoints of the classes below the mode are too low to represent 
the scores falling in the classes and that to the degree that the underlying 
distribution is symmetrical these two opposite types of errors tend to be 
compensating or cancelling in effect. Variance, on the other hand, indicates 
variability as measured by deviations from a central point (the mean). To 
use class midpoints which are either too high or too low for the score values 
they are intended to represent amounts in either case to using inflated 
deviations. Thus it follows that if the basic data are continuous and 
distributed in a bell-shaped pattern, then the variance computed from a 
grouped distribution will tend to be larger than the variance of the original 
Scores, 


An adjustment known as Sheppard's correction is sometimes applied 
to the variance computed from grouped data. This correction is presented 
without further justification in (0.13) and (6.14). 


(6.13) 


(6.14) 


where $?, = the variance computed from the grouped data, and 
h = the size of the class interval. 


The use of this adjustment or correction is strictly appropriate only 
when the underlying distribution of the data is continuous and bell-shaped. 
To whatever degree these conditions fail to be satisfied, the correction will 
fail to be appropriate. Applying the correction to the vari 
for the grouped distribution of Table 6.2, we 


ance computed 
have 


146 


MEASURES OF VARIABILITY 


(3) 


8% шт = 13.0404 — = 13.6404 — .75 


= 12.8904 


And 

Scorr = V12.8904 = 3.59 
Thus even after correction the grouping errors in this situation are sub- 
stantial.* In the case of the variance the grouping error after correction is 


E= 12.8904 — 11.7936 = + 1.0968 


while in the case of the standard deviation it is 


Ез = 3.59 — 3.43 = + 0.16 


6.5 Some SIMPLE RULES REGARDING THE VARIANCE 


In this section we shall present two simple rules regarding the variance. 
Each will be stated both verbally and symbolically and will be verified in 
the case of a simple numerical example. The proof of each is also given. 
While some beginning students may wish to omit consideration of these 


proofs, it is important that all students understand the m saning of the 


relationships. 


Let a constant, C, be added to cach of N scores. Then the 


Rug 6.2. 
ores thus formed remains the same as the variance 


variance of the new set of sc 
of the original set. Or symbolically, 


ype = 8х (6.15) 


Өс ax (6.16) 
Consider the 5 scores (i.e., № = 5) 16, 4, 12, 8, and 10, the 
The deviations of these 5 scores from 10 are + 6, — 6, 
The squares of these deviations are 36, 36 


Example. 
mean of which is 10. 
+2, — 2, and 0 respectively. 
4, 4, and 0 and the mean of these squares is 16. That is, the variance of 
16 and the standard deviation is 4. Now let C — 'Then 
and (6.16) the variance and the standard deviation of 


adding 7 to each of the given scores also 


these 5 scores is 
according to (6.15) 
the new set of scores formed by 
have the values 16 and 4 respectively. That is, 
82×7 = 8х = 16 


and 
8xy7 = ёх — 4 


uping used in Table 6.2 is too coarse to provide 


*As has been previously noted, the gro f 
ata involved. 


accurate computational results for the d: 


147 


MEASURES OF VARIABILITY 


To verify these results we shall actually form the new set of scores and 
determine its variance and standard deviation by direct application of 
(6.4) and (6.5). The new set of scores is 23, 11, 19, 15, апа 17. The mean 
of this new set is 17. Hence, 


go 03 = 17)? + (= 17)? (19 = 17)? + (05— 17)? + (17 — 17)? 


SU Xu 
3] 


= 15 
and, of course, 
ёх+т=4 
Or if C=—3, the variance and standard deviation of the new set still 


remains 16 and 4. Verifying as before, the new set now becomes 13, 1, 9, 5, 
and 7. The mean of this new set is 7. Hence, 


(13— 2? (1 — 7)? + (9— 2? + 6— D?4- (7 — 7)? 


o 


к" 
8° x4(-3) = 


= 16 
and, 


8x4(-3) = 4 


Comment. For some reason this result appears to come as a surprise to 
many beginning students. Actually it is clearly reasonable and should come 
as an expected rather than a surprise result. To see the plausibility of 
this result it is necessary only to recall that the variance (or standard 
deviation) is an index of the degree to which the scores in a collection differ 
in magnitude and to note that such differences remain wholly unchanged 
when all scores in the collection are altered by a uniform amount. 


Proof. By definition of variance [see (6.4)], 


во K+ CT Муус) 


N 


But by (5.7) 


Hence, 


which proves the rule. 


| RULE 6.3. Let each of N scores be multiplied by a constant amount С. 
Then the variance of the new set of scores thus formed is equal to the variance 
of the original set multiplied by the square of this amount. Or symbolically, 


8?cx = С?в?у (6.17) 


148 


MEASURES OF VARTA BILITY 


Коре 6.3a. 
Sex = Сх (6.18) 


Example. Again consider the 5 scores 16, 4, 12, 8, and 10, the variance 
of which is 16. Now let C= 3. Then, according to (6.17) the variance of 
the new set formed by multiplying each of these scores by 3 is 


8737 = (3)2(16) = 144 


and according to (6.18) the standard deviation is 
Sax = (3)(4) = 12 
To verify this result we shall actually form the new set of scores and 
determine its variance by application of (6.4). The new set is 48, 12, 36, 24, 


and 30. The mean of this new set is 30 and hence, 
(48 — 30)? + (12 — 30)? + (36 — 30)? + (24 — 30)? + (30 — 30)? 
= 5 


a? 
8°зх 


апа 


бах = 12 
Or if C = 1/2, the variance and standard deviation of the new set as 


given by (0.15) and (6.16) are 


824, = (1,2)*(16) = 4 


взу 0/2022 


and 
6, 4, and 5. The mean of this 


sanii E 
Verifying as before, the new set 15 8, 2, 
new set is 5 and hence, 


в 5)2+ (2—5)2+ (6— 5)? + (4 —59 = 5)? 
p= — 5 


321; 
= 20 =4 
5 
and 
six = 2 
Proof. By definition of variance [see (6.4)], 
Z(CXi— Mex)? 
S RN 
But by (5 - 
E) Mex = CX 
Hence, SCX CF) 
8?сх = N 


149 


MEASURES OF VARIABILITY 


Now removing the common factor C we have 


[see (3.19)] 


which proves the rule. 


6.6 COMPARISON OF Q AND 8 


In defining Q and s we have pointed out that & depends upon the exact 
value of each score in the collection, whereas the determination of Q re- 
quires only such information as іх necessary to establish Qi and Q).* Asa 
consequence 8 takes into more complete account the information contained 
in the data regarding variability. Both Q and s are expressed in terms of 
the same units as the original scores and may be interpreted as distances 
along the score scale. 

Score distributions most frequently encountered in psychology and 
education are unimodal with the frequencies diminishing in magnitude in 
either direction from the mode, though not necessarily in a symmet rical 
fashion. In such distributions the value of & always exceeds that of Q. 
That this is the case may be seen from consideration of the smoothed 
polygon of the hypothetical continuous symmetrical score distribution 
shown in Figure 6.2 together with the smoothed polygon representing the 


о Qs 


Figure 6.2 Smoothed polygons of a hypothetical continuous symmetrical 
score distribution with X = 10 and of the distribution x = X — 10 


distribution of the deviations of the original scores from their mean (i.c. 
from 10). In the original score distribution Qi is approximately 0.3 and Qs 


The minimum information needed to determine the median 


Section 5.17. TF " Р Qe, is deseribed in 
Section 5.17. 1¢ minimum requirements for the determination of Qi and Qs correspond 
to those of Qs. 2: 


150 


MEASURES OF VARIABILITY 


approximately 10.7 so that Q is about 0.7. In the z-distribution, then 
(г= Х — 10, sec dotted polygon), Qi would be approximately — 0.7 and 
Q5 approximately + 0.7 and Q as before would be 0.7.* Because the dis- 
tribution is symmetrical, Q may be viewed ax the median of the upper half 
of the x-distribution. The upper half of the r-distribution considered alone 
is a positively skewed J-shaped distribution, and the mean of this J-shaped 
distribution would necessarily be greater than its median (see Section 5.10) 
Now s is the square root of the mean of the squar 
But, since the distribution is symmetrical, the S2?; for one-half is the 
sume as уг? for the other half. and $ may be regarded as the square root 
of the mean of the squares of only those values comprising the upper 
half of the distribution. Because the effect of squaring large numbers is 
proportionately so much greater than that of squaring small numbers, it 
follows that the square root of the mean of the squares of the x-values for 
the upper half of the distribution is greater than the mean of these 
«values, which in turn we have already observed to be greater than 0. 
Hence, it follows that $ must be greater than Q in distributions of this 
general type. In the distribution pictured in Figure 6.2 the value of & is 
1.0 ах compared with 0.7 for Q. | . 

When extreme scores are involved, the difference in magnitude between 
irked owing to the fact that 8 is so much more 
such scores. This, of course, follows from 


s of all the z-values. 


Sand Q may become very ma 


sensitive than Q to the presence of 
the fact that < and Q behave in a manner comparable to the mean and 
Median, and from the fact that the mean is much more sensitive than the 
me scores (see Section 5.10). The sensitivity 
of à to the presence of extreme scores is а characteristic that is important to 
keep in mind, As was true of the mean, the effect may be so marked in 
as to invalidate the use of 8 as a descriptive index. 
of & to extreme scores the values of 8 and Q 
ore distributions shown in Figure 5.5, 


median to the presence of extre 


Cases of extreme skewness 
l'o illustrate the sensitivity 
have been obtained for each of the se 
These results are given in "Table 6.3. 


TABLE 6.3 DisTRIBUTION 8 Q 
r A. Unimodal, symmetrical 1.10 0.75 
Values of & and Q for the B. One extreme score at 
Distributions of Figure 5.5 right Б 
С. One extreme score at left 0.75 
D. J-shaped 0.90 


-alues of $ and Q for the unimodal sym- 


The difference between he y gs е S со 
5.5 is of about the sam rder of magni 
ag 


Metrical Distribution A of Figure 


тү gi È 
he addition of the same 


amount (— 10) to each score has no effect upon variability 
See (6.16), | 


MEASURES OF vARIABILITY 151 


tude as was noted in the case of the continuous unimodal symmetrical 
distribution of Figure 6.2. Distribution B of Figure 5.5 is like A except that 
one of the two highest seores of the A distribution is shifted to an extreme 


position far up the seale (from a value of 85 to a value of 145). This 
change of a single score had no effect upon the value of Qs and hence none 
upon the value of Q. The value of s, however, increased more than 12 
times and, except for the single extreme score, exceeds twice the range of 
the rest of the distribution. Distribution C is the mirror image of B, while 
Distribution D is a positively skewed J-shaped distribution. 


It must, of course, be recognized that the distributions of Figure é 
are extreme hypothetical examples. For a comparison of the relative 
magnitudes of & and Q in the case of skewed distributions that are more 
realistic, attention is directed to the distributions shown in Tables 2.9 and 
2.10. Table 2.9 shows the distribution of 1,000 individual incomes in 
dollars for the year 1946. The value of 8 in this distribution is approxi- 
mately $3,450 as compared with $825 for Q. If a distance equal to Q is 
marked off to either side of the mean of this distribution the resulting sec- 
tion of the scale contains about 43 per cent of the distribution. If, on the 
other hand, a distance equal to $ is marked off to either side of the mean, 
the section of scale thus established encompasses over 97 per cent of the 
scores involved. 


Table 2.10 shows the distribution of the numbers of years of service of 
361 teachers in a certain city school system. This distribution is also pic- 
tured graphically in Figure 2.10, For this distribution the values of Q and 
$ are approximately 6.7 years and 10.0 years respectively. The segment 
of the scale from one Q below the mean to one Q above contains about. 
48 per cent of the distribution as compared with approximately 84 per cent 
for the segment. extending from one & below the mean to one $ above. Con- 
sideration of the histogram of this distribution (see Figure 2.10) suggests 
that for distributions of this type a single index of variability may not be 
as useful as several interpercentile distances. It is clear from the graph 
that the measures are quite compactly distributed over the lower portion 
of the scale and widely scattered over the upper portion. Inspection of 
several selected percentiles would reveal this situation, whereas considera- 
tion of 8 or Q would not. Table 6.4 gives approximate values of selected 


TABLE 6.4 


PR DISTRIBUTION OF DISTRIBUTION OF 
TABLE 2.9 Tase 2.10 
Approximate Values of - Wd LAT 
Selected Percentile Points 2 ipo SLO yrs. 
for Distributions of Tables ig run pen ui 
2.9 and 2.10 25 ds |. ids 
5 $110 $ yis: 


152 


MEASURES OF VARIABILITY 


percentiles for the distributions of Tables 2.9 and 2.10. In each case the 
fact that Өз and Pos are much further above the median than Qi and Р» 
are below it, indicates a highly variable upper portion of the distribution 
and a highly compact lower portion. Thus we not only have information 
about the variability of the distributions not revealed by 8 or Q but we 
also have information regarding their form (see Sections 4.12 and 4.13). 


6.7 Uses or MEASURES OF VARIABILITY: COMPARING VARIABILITY 


An obvious use of a quantitative index of variability is in comparing 
the relative degree of variability among the individuals in two groups with 
regard to some trait. It should be equally obvious that this application is 
possible only if the trait scores are expressed in terms of the same unit of 
measure for both groups. 

Suppose, for example, that the standard. deviations of the heights of 
two groups of children are reported as 2 and 4 respectively. Clearly, no 
one would contend that the second group was twice as variable as the first 
if it were known that the height scores for the second group were expressed 
in centimeters while those for the first group were expressed in inches. Yet 
it is not uncommon for beginning students to infer, say, that a group of 
children is twice as variable in ability to read as it is in ability to solve 
arithmetie problems, simply because the standard deviation of their scores 
on a given reading test is twice that of their scores on some arithmetic test. 
In so doing they ignore completely the possibility of a total lack of com- 
parability between the two measuring scales involved. 

The use of indexes of variability as a basis for comparing the relative 
degree of variation in two collections of scores is illustrated in the following 
Section. 


6.8 Uses or MEASURES OF VARIABILITY: RELIABILITY OF 
MEASUREMENT OR ESTIMATE 


It was suggested in Section 6.1 that one of the more important applica- 
tions of indexes of variability is in the study of errors of measurement or 
estimation in situations in which the true value being estimated is unknown. 
It was observed there that such situations always arise in the measure- 
ment of continuous attributes or in attempts to determine, or “estimate,” 
some population fact by means of a sample. Since the true value is un- 
known, we cannot study error by the obvious device of noting the difference 
between estimated and true value. However, if the estimating technique is 
free from bias—i.e., is just as likely to produce an overestimate as an underesti- 
mate—the accuracy of the technique may be investigated by studying the 
extent to which re-estimates produce essentially the same result, If the 


MEASURES OF VARIABILITY 1 5 3 


magnitudes of independent estimates of the same true value vary widely, 
the estimating technique must be regarded as inaccurate. On the other 
hand, if the technique yields estimates which are in close agreement, the 
technique must be recognized as an accurate one. Thus some index of the 
variability (e.g., 8 or Q) of estimates of the same true value obtained by 
independent applications of the same estimating or measuring technique 
constitutes a quantitative index of the accuracy of this technique. A com- 
parison of such index values for two or more different techniques or pro- 
cedures for estimating the same true value provides a basis for evaluating 
the relative aceuracy of the procedures. 

By way of illustration let us suppose that it is desired to know in ad- 
vance of an election the proportion of eligible voters in the United States 
who favor presidential Candidate A over Candidate B. It is, of course, a 
practical impossibility to question each eligible voter in advance of the 
election in order to determine whether or not he prefers A over B. Hence, 
the true value of the required proportion can never be predieted and some 
estimate of it, based upon only a small portion (i.e., a sample) of the entire 
population of eligible voters, will necessarily have to do. Suppose that it is 
decided to use a "sample" of 1,000 eligible voters and that some method 
of selecting this sample has been invented which is free from bias. This 
means that while this method of sample selection would not, if repeated, 
lead to the selection of precisely the same individuals, it would nevertheless 
produce estimates of the true proportion which would not difter from it 
any more in one direction than in the other. Let us suppose that by means 
of this selection technique 1,000 eligible voters have been identified and 
asked for their preference between А and B, and that, of these, 485 or 485 
favored А. This, of course, represents only an estimate of the true propor- 
tion favoring A and the actual magnitude of the error involved cannot be 
determined in advance of the election. 


Now ordinarily this is the only sample that we would select. That is. 
we would stand or fall on the accuracy of this estimate, for if we could 
afford to study more eligible voters we would undoubtedly prefer to expand 
the size of our sample and thereby improve the accuracy of the estimate, 
rather than to obtain additional independent estimates of this same true 
proportion simply to enable us to make some statement about the degree to 
which they vary. Just what may be done in a situation of this type to 
enable us to base our estimates on all individuals selected and yet obtain 
some indication of the degree to which several independent determinations 
of such estimates would vary, is the subject of a later chapter. For purposes 
of completing our illustration of the points in question we shall turn from 
the practical example of polling preference for presidential candidates to an 
analogous but purely hypothetical situation. j 


Suppose that instead of à population of eligible voters, we have a large 
collection of beads which are alike except for the fact that some are white 


154 


MEASURES OF VARIABILITY 


and some are red. Suppose further that it is desired to estimate the propor- 
tion which are red by means of a sampling procedure known to be free from 
bias. To provide a basis for assessing the accuracy of this procedure we 
shall repeat it a number of times, thus obtaining a number of estimates of 
the same true value. The standard deviation of these estimates provides a 
quantitative estimate of the accuracy of the estimating (actually the 
sample-selecting) procedure. Quantitatively this index is inversely related 
to accuracy. That is, a large value of this standard deviation implies 
marked variation in estimated values and consequent inaccuracy, while a 
small value implies close agreement among the estimated values and a 
high degree of accuracy. 

This experiment was actually conducted as described on a small scale. 
First, 25 samples each containing 50 beads were selected by a purely chance 
or random procedure which would be free from bias. The proportion of 
red beads was determined for each sample, so that 25 independent estimates 
of the true proportion of red beads in the “population” were available. 
Then 25 samples, each containing 100 beads, were selected by the same 
procedure and used to provide 25 other estimates of the actual proportion 
of red beads in the "population." Now, obviously, samples of 100 beads 
should provide estimates which are more accurate than those based on 
samples of 50 beads. Consequently, we may predict that the standard 


R =.28 
Q =.051 
8 =.072 


.30 35 .40 .45 .50 .55 .60 65 
25 Estimates Based on Samples of 50 Beads 


5 

4 R-.19 
3 Q =.035 
2 5 =.048 
1 


| 
.30 35 40 45 .50 55 60 65 


25 Estimates Based on Samples of 100 Beads 
Ficurs 6.3 Histograms showing distributions of estimates of the true 


proportion of red beads in a collection of red and white beads for samples 
of 50 and 100 beads 


MEASURES OF VARIABILITY 1 5 5 


deviation of the 25 estimates based on the samples of 100 beads will be 
smaller than the standard deviation of the 25 estimates based on the 
samples of 50 beads. Thus we have an illustration of the use of an index 
of variability both as an indicator of the accuracy of a particular estimating 
procedure and as a basis for comparing the accuracy of two estimating 
procedures. 

The results of this experiment are presented in Figure 6.3. In this 
figure the upper and lower histograms picture the distributions of estimates 
based on samples consisting of 50 and 100 beads respectively. It is clear 
that the estimates based on samples of 50 beads vary more than the 
estimates based on samples of 100. The range (/), semi-interquartile range 
(Q), and standard deviation (8) for each distribution are also shown in 
Figure 6.3. Regardless of which of these indexes of variability is used as 
basis for comparison, it is clear, as was predicted, that estimates based on 
the larger samples are less variable and hence, more accurate. 


1 5 6 MEASURES OF VARIABILITY 


STANDARD SCORES 


7.1 INTRODUCTION 


We have previously noted (Chapter 4) that many of the scales used in 
education and psychology are rank-order scales which yield scores that 
have little or no absolute significance and that are not directly comparable 
from scale to scale. We have also noted that even scores (measures) derived 
from fundamental scales become more meaningful when considered in 
relation to a collection of such scores obtained for some reference group of 
objects or individuals. The interpretation of a score, therefore, either re- 
quires or is enhanced by the derivation of some measurement of its place- 
ment or position in a reference collection. One of the most widely used of 
such derived measures is the percentile rank. In this chapter we shall 
consider another scheme or technique for indicating the position of a score 
in a reference distribution. 


7.2 Tne CONCEPT oF STANDARD SCORES 


The percentile rank indicates the placement of a score in a distribution 
by stating the percentage of scores that are smaller. Another possible 
Approach might consist in indicating the placement of a score by reporting 
its location with reference to a central point such as the mean. Suppose, 
for example, that the mean of a certain score distribution is 80. А score of 
72 in this distribution might be reported as — 8 indicating a value eight 


STANDARD SCORES 1 57 


score points below the mean. Or a raw score of 86 might be reported ах 6 
indicating a value six score points above the mean. ' 
Another method of imparting this information consists of adjusting 
the scores of a collection so as to change their mean to some standard value. 
Such an adjustment might simply consist of adding some constant amount 
to each score. Suppose, for example, that it is decided to use 100 ax the 
standard value for the mean. If the mean of the original score values is 50 
it is necessary only to add 20 to each score to form a new collection with 
the mean having this desired standard value. Scores of 72 and 86 in the 
original collection assume values of 92 and 106 respectively, in the new 
collection. Since it is known that the mean of the new collection has the 
standard value 100, scores of 92 and 106 are immediately recognized as 
being respectively 8 points below and 6 points above the mean. . 
Such a scheme obviously results in score values that embody some in- 
formation not contained in the original scores, namely, information regard- 
ing location with reference to the mean of the distribution. While some 
gain has thus been achieved, the meaningfulness of such scores remains 
clouded by failure to relate them to the variability of the distribution 
involved. If, for example, the distribution is quite homogeneous so that 
most of the scores are erowded closely about the mean (100) a score of 92 
may represent an extremely low value in relation to the other scores. On 
the other hand, if the distribution is highly variable, much of that part of 
it below the mean may extend far below 92, in which case a score of 92 
would actually correspond to, or represent, a more nearly typical score 
value. Consequently a score of 92 in one collection having a mean of 100 
could have a vastly different meaning from a score of 92 in another having 
a mean of 100, owing to differences in the variability of the two collections. 
This inadequacy of the scheme can be overcome by altering the original 
score values of the distributions so as to cause them to exhibit some same 
standard degree of variability as well as to have some standard mean value. 
This accomplished, a score of a given magnitude would have more nearly 
comparable meaning from one distribution to another. Scores whose distri- 
butions have means and standard deviations of some standard value are known 
as standard scores. The operation by which the original or raw scores 
(X-values) are converted into standard scores is known as a transformation. 
In the following section we shall consider how the X-scores may be trans- 
formed into standard scores. 


7.3 "TRANSFORMING SCORES INTO STANDARD orm 


Suppose we arbitrarily decide to transform the 
set of values for which the mean is zero and the st 
Here we use zero and one as the standard ve 
deviation. The advantages of this choic 


original scores into a 
andard deviation is unity. 
lues of the mean and standard 
e should become obvious when it 


158 


STANDARD SCORES 


is recognized that in this system, a score of + 1.5 is recognized at once as 
being one and one-half & 
— 0.5 as being one-half standard deviation below the mean, ete. 

To perform this transformation we first multiply each X-score by the 
constant multiplier 1 sx, where gy is the standard deviation of the N 
distribution. By application of (6.18) where C= 1/8x, the standard 
deviation of these produets is seen to be 


indard. deviations above the mean, a score of 


—-.$y-l 
Sy 


Also by application'of (5.8) the mean of these products is seen to be 


where X represents the mean of the X distribution. Now if we add to each 
of these products the negative of their mean (i.e, — Х/ёу)® the resulting 
new collec ‘tion c of values will have a zero mean, for by application of (5.7) 
where (= — 


x we have 


x X 
Mean of new values = — "r1 (- a =0 


Sx 


Thus, by (1) multiplying each original score (X-value) by 1/8x and (2) 
adding — X/àx, we derive a set of standard scores having respectively the 
Standard values of zero and one for their mean and standard deviation. 
It is common practice to represent the values of the standard scores of this 
articular system (i.e, the system in which the mean is zero and the 
standard deviation wi by the lower case letter 2. The two steps, that is, 
(D multiplication by 18x, and (2) addition of — Хау, may be combined 
ints he allowing Aic for transforming any X- value into the cor- 
responding z-value. 


(7.1) 


By way of simple = consider the collection of five X- cores 
having the values 16, 8, 10, 4, and 12. The mean and standard deviation 
of these scores are 10 and 4 покае Now multiplying each score by 
1/4 and adding = 10/4 we obtain the z-scores, + 1.5, — 0.5, 0, — 1.5, and 
$ 0.5. As may be readily verified, these five scores have a mean of Zero 
ünd a standard deviation of one. This being the case it follows that the 
Value + 1.5 indicates a score опе and one-half standard deviations above 
the mean; the value — 0.5 a score one-half standard deviation below the 
mean; the value 0 a score at the mean; ete. These statements are, of 
course, characteristic of the corresponding V-scores. That is, since X = 10 


ng X /$x from each. 


Ns, of course, is the same as subtr: 


STANDARD SCORES 1 59 


and ёх = 4 an X-score of 16 (corresponding to z= + 1.5) is obviously one and 
one-half 8x values above Y; an X-score of 8 (corresponding to 2 = — 0.5) 
one-half 8x below Y; ete. However, this information is not contained in 
the X-values themselves (e.g., 16 and 8) whereas it is incorporated in the 
z-values (e.g., + 1.5 and — 0.5). 

Since the terms of the right-hand member of (7.1) have common de- 
nominators they may be combined as follows: 


aM. 


z 
; Sx 


It is perhaps more common to prescribe the transformation of X to z by 
(7.2) than by (7.1). From (7.2) it is immediately clear that the z-value 
corresponding to a given X indicates the deviation of this X-value from the 
X mean in units of the X standard deviation. Also it is clear from (7.2) 
that a z-score is a pure or abstract number as distinguished from a concrete 
or denominate number (i.e., а number applied to some specifie dimension 
as 6 inches or 114 IQ points). This characteristic opens the possibility of 
comparing an individual's status in one trait with his status in another. 


7.4 z-ScoRES AS LINEAR TRANSFORMATIONS OF THE X-SCORES 


The rule for transforming X-scores into z-scores is completely specified 
in (7.1). It should be noted that this rule which involves (1) multiplication 
by a constant and (2) the addition of a constant is of the following general 
type for transforming any variable u into a variable w. 


w= au +b (7.3) 


where a and b represent any constants. 
That is, (7.1) is a special case or application of (7.3) where 


сй 
а= 
апа 


s% 
к: 


Any transformation which is of the type specified by (7.3) is called a 
linear transformation because when corresponding values of v and w are 
plotted as points with reference to a set of coordinate axes the points fall 
on a straight line. To illustrate this fact the corresponding z- and X-values 
for the illustrative collection of the preceding section (Table 7.1) are plotted 
in Figure 7.1. It will be observed that the points fall on a straight line. 

An important property of any linear transformation is the 


| > proportion- 
ality of the difference between any pair of u v | 


alues to the difference between 


160 


STANDARD SCORES 


TABLE 7.1 z x 


+ 1.5 16 
Corresponding z- and  X-Values 05 8 
(Example of Section 7.3) 0 10 
S15 4 
+ 0.5 12 


the corresponding w values. Let u and uz represent any pair of u values. 
Then by (7.3) the corresponding w values are 


ш = аш + 
апа 
wa = aus + b 
Now subtracting 
wı — wa aui + b — aus — b 
or 
шу — We = a (tı — u2) 


Thus the difference between wi and ws is seen to differ from that between 
ма and ио by the constant factor a. In our application, differences between 


8 U9 10 11 12 13 14 15 16 


=0.5 2% 


Ficure 7.1 Graph illustrating linear character of relationship between 
corresponding X- and z-values 


Pairs of z-values will always differ from those between the corresponding 
X-values by the constant factor 1/8x. In the foregoing example, differences 


STANDARD SCORES 1 61 


between any pair of z-values will always be one-fourth as large as the 
differences between corresponding pairs of X-values.* 

The important implication of this proportionality property is that, 
differences between pairs of z-scores must have precisely the same meaning 
as differences between corresponding Y-seores. If the X-seale is a rank- 
order scale in which differences of a given magnitude do not have the same 
meaning at one portion of the scale as at another, then the same is true of 
the z-scale. Of course, more information is embodied in a z-score than in 
the original X-score, since the z-score indicates position with reference to 
the mean in terms of the number of standard deviations. However, the 
linear transformation by which this information is incorporated docs not, 
in any way, impute to the z-scale any of the properties of a fundamental 
scale not already present in the original Y-scale. 


7.5 Some PROPERTIES OF THE z-SCALE 


The following properties are more or less implicit in the definition of 
the z-transformation. They are of sufficient importance, however, to 
warrant explicit statement here, if only for sake of emphasis. 

Let a given collection of X-values be transformed into z-values by 
application of (7.1) or (7.2). Then the following statements apply to the 
resulting z-scale. 


(1) The mean of the z-values is zero. That is, 


2=0 (7.4) 

(2) The sum of the z-ralues is гето. Or symbolically, 
Xa-0 (7.5) 
(3) The standard deviation (or variance) of the z-ralues is unity. That is, 
SEN (7.6) 


(4) The sum of the squares of the z-valucs equals their number. Or sym- 
bolically, 


E2;BN (7.7) 


The first and third properties, (7.4) and (7.6), of course, follow directly 
from the definition of the z-transformation. That is, the transformation 
was so defined as to lead to a set of scores which would have for their mean 
and standard deviation the arbitrarily selected values zero and one. The 
second property (7.5) follows from the first by application of (5.2). Finally. 


MD " 3 

he student should verify this for selected pairs of z- and N-values from Table 7-1- 
or example, consider the first two z-scores. The difference between them is 2, which is 

one-fourth of the difference between the corresponding X-values. | 


162 


STANDARD SCORES 


if the standard deviation of the z-values is one, their variance must also 
be one, and we may write 


Za; — 

v= 1 
or 

222,=N 


which establishes the fourth property (7.7). 


The student may find it instructive to verify these properties using 
the distribution of z-values given in Table 7.1. 


7.6 OTHER Systems or STANDARD Scores 


The z system of standard scores involves the transformation of the 
original scores to a standard set having a mean of zero and a standard devia- 
tion of one. The values zero and one represent. purely arbitrary choices. 
OF course, they represent advantageous choices in that they result in z- 
values which are directly interpretable as deviations from the mean in 
Units of standard deviation. However, other choices may be made which 
also incorporate this same information. Suppose, for example, that it is 
desired for some reason to establish a system in which the mean is taken 
to be 50 and the standard deviation to be 10. Then in such a system a 
Score value such as, say, 40, is immediately recognized as being one standard 
deviation below the mean. Such a system incorporates the same type of 
information in its score values as does the z system. It may be argued that 
this information is not presented as directly in such a system as in the z 
System. While this is to some degree true, such a system may have other 
advantages. For example, it may render the use of signed values or of 
values involving decimal fractions unnecess These advantages are 
Particularly important in situations in which it is desired to carry out 
Certain statistical computations using the standard-score values. 

In this section we shall consider systems which employ values other 
than zero and one as standard values for the mean and standard deviation, 
We shall begin with the gencral case in which it is desired that the mean 
and standard deviations of the transformed values be M and S respective ly. 
As in the case of the z-transformation the first step calls for the multiplica- 
tion of each score by a constant multiplier. If this multiplier is taken to be 
S/8x, then, by application of (6.18) with C = S/8x, the standard deviation 
of these products is seen to be 


S 
=. کر کچرق‎ 
8x 


which is the desired value. 


STANDARD SCORES | е 3 


Now by (5.8) the mean of these products is 


Consequently if we add the constant amount 
Sx 


M— 


to each of these products we obtain a set of standard scores having the 
desired standard values, M and S, as mean and standard deviation. The 
addition of a constant amount to each of these products does not, of course, 
affect their standard deviation which remains S [see (6.16)], and by appli- 
cation of (5.7) with C = M — (SX/8x) we see that the 
Mean of the new values — Ex + (м — s5) EM 
8x 8x 
which is the value desired for their mean. 
The two steps involved in this transformation, namely, (1) multiplica- 
tion by S/8x and (2) the addition of M — (SX /8x), may be combined into 
a single formula. Let the capital letter Z (read "сар Z”) represent a score 
in this system. Then $ n 
L^ 4. , га SX - 
а= хм - S| (7.8) 
To illustrate the application of (7.8) in а special case we shall again 
use the X-values of Table 7.1, for which the mean and standard deviation 
are 10 and 4 respectively. Suppose it is desired to transform these X-values 
into a set of Z-values having a mean of 50 (i.e., M = 50) and a standard 
deviation of 10 (i.e., 5 = 10). Substituting into (7.8) we obtain 
Z- n ХЕ = | 
2.5 X;+ 25 (1) 
Substitution of the X,-values into (1) leads to the desired transformed 
values. The original and transformed or standard-score values are shown 
in Table 7.2.* 
: It should be observed that the transformation prescribed by (7.8), 
like that prescribed by (7.1), is a linear transformation. In the case of (7.8); 
the values of the constants a and b of (7.3) are 


S 
a= Be 
x 
and 
ъ= М – SX 
8x 


*The student may wish to verify that the mean standard deviation of these Z 
sh t 1 an an : fi 
ae $ 5 1 standard. deviation of 


164 


STANDARD SCORES 


TABLE 7.2 


Corresponding X- and Z-Values 
Where M = 50 and S= 10 


As in the case of the z-transformation, the importance of this observa- 
tion lies in the proportionality property of any such transformation. That 
is to say, differences between pairs of Z-values, like those between pairs of 
z-values, can be no more useful as a basis for comparing differences in the 
amounts of some trait possessed by two individuals than are the original 
X-values themselves. 

It will be instructive to investigate the relationship between the z- and 
Z-transformation. Formula (7.8) may be rearranged as follows: 


Aou SX 
= Жү Зе де 
Zi a Xi a, 1M 

= © ny 

ex ч 

м‏ .و 
8x‏ 


And now substituting from (7.2) we obtain: 
Zi= 82:4 M (7.9) 


Thus the Z-transformation is seen to be in turn a linear transformation of 
the z-transformation in which the constants a and b of (7.3) take the desired 
Standard values of the standard deviation and mean (i.e, S and M). In 
computing Z-values it is fairly common practice to obtain z-values as an 
intermediate step, and then to obtain the Z-values by application of (7.9). 

It is clear from (7.9) that if the situation warrants the determination 
of z-values to the nearest tenth then the use of S — I0 in a Z-transformation 
maintaining a like degree of accuracy will result in Z-values which are free 
of decimal fractions. On the other hand, if the z-values may be determined 
to two decimal places and a like degree of accuracy is to be maintained in a 
Z-transformation, an S of 100 is needed to free the resulting Z-values of 
decimal fractions. If an S of 10 is used with an M of 50 the system will 
usually be free of negative numbers, for in this case B5 negative value can 
Occur only in the presence of an X-value which is more than five standard 
deviations below the mean. Such values are, of course, extremely rare, 
Similarly if an S of 100 is used, an M of 500 will usually free the system of 
negative values. For these reasons the most commonly used combinations 
9f values for S and M are 10 with 50 and 100 with 500. 


STANDARD SCORES 165 


Tn the following section a more extensive example showing the applica- 
tion of the z- апа Z-transformations to a collection of 200 X-values is 
presented. 


77 AN EXAMPLE COMPARING THE X-, z-, AND Z-SCALES 


Table 7.3 shows the frequency distribution of a hypothetical collection 
of 200 test scores. The particular distribution involved is markedly skewed 
to the right. Table 7.3 also shows the computation of the mean and stand- 
ard deviation of this set of scores. The procedures employed involve the 
application of formulas (5.3) and (6.10). 


TABLE 7.3 The Frequency Distribution of a Hypothetical Set 
of 200 Test Scores and the Computation of X and $ 

x f fX fx? 
20 1 20 400 

19 1 19 361 

18 Ё 36 648 _— 1572 25 

17 2 34 578 х= 200 ^ 7.86 

16 3 48 768 

15 4 60 900 2.14004 „сууз 
14 5 70 980 азе оа 
m 6 78 1014 = 7347 — 61.7796 
12 7 84 1008 

11 8 88 968 = 11.6904 

10 9 90 900 = 

9 12 108 972 кез 

18 144 1152 1/8= 0.29 

7 26 182 1274 = 

6 46 276 1656 — X/8= — 2.30 

5 35 175 875 

4 15 60 240 

200 1572 14694 


Table 7.4 shows the z and Z100 (i.e., S= 100, Af = 500) values cor- 
responding to each X-value. The z-values corresponding to each X were 
obtained by first multiplying each X-value by 1/8, that is, by 0.29 (see 
third column of Table 7.4), and then adding to each of these products the 
negative of X/8, that is, — 2.30. The Zioo values were obtained from the 
2 values by application of (7.9). It is obvious that the frequencies are dis- 
tributed in precisely the same pattern regardless of which scale is involved- 
In other words, the form of the distribution is unaffected by the trans- 
formation. Figure 7.2 shows the polygon for this frequency distribution 
with reference to all three scales, which have been placed in juxtaposition 


166 


STANDARD SCORES 


TABLE 7A The z- and Zioo-Values Corresponding to Each 
X-Value of the Distribution of Table 7.3 


AG £ 0.291 2= 0.291 — 2.30 2100 = 1002 + 500 
20 1 850 
19 1 521 
18 2 792 
17 2 763 
16 3 734 
15 4 705 
14 5 676 
13 6 647 
12 7 618 
11 S 589 
10 9 560 
9 12 531 
8 18 502 
7 26 473 
6 46 444 
5 35 415 
4 15 386 
Lo 200 


at the base of the figure. This alignment of these scales illustrates the 
Proportionality property of linear transformations by showing that differ- 


ch ЛТ ЖАЛТ T 
XScale 3 4 5 6 7 8 9 1011 12 13 14 15 16 17 18 19 20 21 


2-Scale —1.0 0 


| 
Zio-S$cale 400 500 600 700 800 900 
Mean 


Figure 7.2 Polygon of hypothetical score distribution of Table 7.3 with 


reference to X-, z- and Z-scales 


STANDARD SCORES 167 


ences between corresponding pairs of values may be represented by the 
same physical distance along each of the scales. It is particularly important 
to note that the form of the distribution is invariant under a linear trans- 
formation. The critical aspect of this fact in connection with the inter- 
pretation of standard-score values will be treated in a later section. 


ING STANDARD SCORES DERIVED FOR DIFFERENT 
REFERENCE GROUPS 


Т.б INTERPRE 


Consider Pupils A and B who belong to Reference Groups I and Hl 
respectively. Suppose that both A and B made the same raw score on some 
test. If this test is accurate then A and B clearly possess equal amounts of 


the trait measured. This is, of course, not to say that the z-scores cor- 
responding to this raw-score value will necessarily be the same in the case 


of both reference groups. Obviously these z-scores can be equal only if 
the Y means and standard deviations are the same for both groups [see 
(7.1) or (7.2)]. Like percentile ranks, standard scores are derived with 
reference to a particular group or collection of scores. If the mean and 
standard deviation of a particular reference collection of X-values dilfer 
from those of some other reference collection, then the standard scores 
derived for these collections are not comparable. That is, equal standard 
scores will not correspond to equal raw scores and cannot, therefore, be 
interpreted as representing equal amounts of the trait involved. 

The interpretation of standard scores derived from different reference 
groups is further enhanced by knowledge of the forms of the two score 
distributions even when it is known that the two distributions have equal 
means and standard deviations. Of course, if the two reference groups have 
equal means and standard deviations, equal standard scores correspond to 
equal raw scores, and, to the extent that these scores are accurate, to equal 
amounts of the trait involved. So far, then, as indicating equal amounts 
of a trait is concerned, the standard scores are only as good as the raw 
scores themselves. The standard scores are more meaningful only in the 
sense that they have incorporated in them information regarding their 
mean and standard deviation. It cannot be inferred, however, that equal 
standard scores derived from reference groups having equal means and 
standard deviations have equal percentile ranks any more than it can be 
inferred that equal raw scores have equal percentile ranks. ‘This follows 
from the fact that the two (or more) reference groups may differ in respect? 
other than central tendency and variability. To whatever extent they may 
thus differ, equal raw scores (or their standard-score equivalents) will tend 
to hold differing ranks in their respective groups. 

By way of illustration two hypothetical frequency distributions which 
аге mirror images of each other are presented in Table 7.5. Both of these 


distributions have the same mean (Х = 16) and the same standard devia- 


168 


STANDARD SCORES 


PR- and z-Values Corresponding to Each Unit 
Point in Two Hypothetical Distributions Which 
Have Equal Means and Standard Deviations but 
Which Are Skewed in Opposite Directions 


TABLE 7.5 


iMm DISTRIBUTION I DisrRIBUTION II 
X f PR X f PR 
2S 28 1 99.7 
27 27 2 98.9 
26 26 2 97.8 
25 25 2 96.8 
24 24 3 95.4 
23 23 4 93.5 
22 22 5 91.1 
21 21 6 
20 16 95.7 20 7 
19 40 80.5 19 8 
18 30 61.6 18 9 
17 22 47.6 17 12 
16 6 37.3 16 16 
15 12 29.7 15 22 
Ы 9 24.1 14 30 
13 8 19.5 13 40 
12 7 15.4 12 16 
11 6 114 11 
10 5 8.9 10 

9 4 6.5 9 
8 3 4.6 8 
7 2 32 7 
6 2 po 6 
5 2 1.1 5 
4 1 0.3 4 m 

185 185 

For both distributions X = 16 and & = 3.643 


tion (8 = 3.643), but Distribution I is negatively skewed while Dist ribution 

I is positively skewed. Since the means and standard deviations are the 
"ame it necessarily follows that equal raw scores in these distributions also 
have equal standard scores. The raw score of 20, for example, corresponds 
to a standard score of + 1.10 in both distributions. It will be observed, 
however, that in Distribution I this score value is the largest involved and 
has an estimated P-value of 95.7. In Distribution II, on the other hand 
this same seore value has an estimated P-value some ten points Toney 
(PR= 84.6). In general, positive z-values of a given magnitude have lower 
Percentile ranks in positively than in negatively skewed distributions, 


STANDARD SCORES 1 69 


i ive z-values i is way that 
whereas the reverse is true of negative z-values. It in | | : 
Г istri i > reference collection con- 

knowledge of the form of the distribution of the referen 1 


tributes to the interpretation of z-score values. 


7.9 INTERPRETING STANDARD SCORES DERIVED FROM DIFFERENT 
LAW-SCORE SCAT 


It was stated in Section 7.3 that a z-score is a pure or abstract number 
and that this characteristic opens the possibility for comparative state- 
ments about an individual's status in one trait as against his status m 
another. A school pupil, for example, might obtain raw scores on apalio 
and arithmetic tests of 50 and 20 respectively. These raw scores art 
neces 


arily in terms of completely different units and are not e 
If, however, for a given reference group these raw-score values ARER 
to z-values of — 0.5 and + 2.0, then it is clear that in comparison Кын. x 
pupils comprising the reference group this pupil is much more able in ү 
metic than he is in spelling. Such standard scores are comparable in 
sense that they belong to collections whose means and standard deviations 
have known standard values. " | r 

In interpreting or comparing the amounts of different traits BONEN 5 
by a given individual where the measured amounts are expressed as guai 
ard scores, it is important to keep in mind the fact that two (or more) ges 
standard-score values do not necessarily imply that the individual's EU 
in the reference group is the same for both traits involved. As was posit 
in the foregoing section, the percentile rank of a given standard m 
depends upon the form of the score distribution. ‘This shortcoming suggests 
that standard scores could be considerably improved if in addition a 
involving a standard M and a standard S, they could be made also to involv н 
a standard form of distribution; for, then, equal standard scores cart 
imply like ranking in the reference group. It is not possible to accomplish 
this refinement by means of a lincar transformation, for under such trans: 
formations the form of the distribution remains unchanged (see Section 7 nf ). 
In the following chapter we shall consider a different type of transformat ion 
Which will lead to standard scores that have a standard form of distribution 
as well as a standard mean and standard deviation. f 

An important advantage of percentile ranks over standard stores 4 
the type considered in this chapter arises from the fact that an individual's 
rank in a given group with reference to 
changes in seale within the limits of the accuracy of the seales. Clearly, 2? 
individual's rank in a given group with reference to the amount he possesses 
of some given trait is what it is, regardless of the system by which the 
amounts are measured, so long as the system is accurate.* An individual 


a given trait is invariant under 


*Students familiar with the terminologv of me: 


r ‚ « word 
asurement will recognize that the wor 
Accurate as used here means both reliable 


and valid, 


170 


SonES 
STANDARD SCORE 


Who ranks at the 80th percentile in height in a given group will remain at 
this rank whether the heights are measured in terms of inches or centimeters 
if the measurements are accurately made. 
| Standard scores of the type considered in this chapter are not necessarily 
invariant under changes in scale, owing to the fact that such changes may 
lead to score distributions differing in form. An easy spelling test, for 
example, would result in a distribution of scores skewed to the left, and a 
difficult test in a distribution skewed to the right. Yet if both tests provide 
accurate measures of spelling ability the best speller would rank first, the 
Second best speller second, ete., regardless of which test is used. Their 
standard scores, however, would vary with the test employed. 

To illustrate, the scores made by the same group of individuals on two 
hypothetical tests of the sime trait are shown in Table 7.6. The tests are 


TABLE 7.6 Raw and Standard Scores* Made by 10 Individuals 
on Two Completely Accurate Hypothetical Rank- 
Order Scales Measuring the Same Trait 


Rr — 
Raw Scores STANDARD SCORES PERCENTILE RANKS 
INDIVIDUALS 
Test I Test II Test I Test IH Test I Test IT 
A 4 04 1.83 2.61 95 95 
B 8 27 0.91 0.63 80 80 
C 3 27 0.91 0.63 50 80 
D 2 8 0.00 — 0.38 50 50 
D 2 8 0.00 | —038 50 50 
F 2 8 0.00 | — 0.38 50 50 
i 2 8 0.00 | —0.38 50 50 
H 1 1 — 0.91 — 0.76 20 20 
I 1 1 — 0.91 —0.76 20 20 
J 0 0 — 1.83 — 0.81 5 5 
I = |... — 
x 2 15.2 0.00 0.00 s: 
Мап, 2 8 0.00 | —0.38 
9 1.095 | 18.713 1.00 + 1.00 — 


* 
арун, 
Scores reported to nearest 100th. 


“sumed to be completely accurate and hence must necessarily rank these 


dividuals in the same way. Actually, the scores on Test IT are simply 
à 1€ cubes of оге: оп Test I. The cubing, of course, changes the form of the 
Score distribution from perfectly symmetrical to markedly skewed to the 
“ight, The two distributions of standard scores (z-values) and the two 
Sets ОЁ percentile anie are also shown in this table. It will be noted that 
piven individual's standard score differs from scale (test) to scale whereas 


is ^ 
Percentile rank is the same. 


171 


STAN 
ANDARD SCORES 


7.10 Trsr-BATTERY COMPOSITE SCORES 


Since many traits are highly complex in character, it is not uncommon 
to find that tests designed to measure such traits consist of parts or subtests 
devoted to the measurement of the relatively more specific aspects of the 
whole. Such a collection of subtests is often referred to as a test battery, 
and it is a common practice to combine the subtest scores into a single 
composite score for the battery. This composite score is then treated as a 
measure of the complex trait as a whole. Thus, in the measurement of 
achievement at the elementary school level, the subtests of a battery might 
include tests in reading comprehension, in arithmetie problem-solving, in 
the various language skills, ete. The status of a pupil's achievement on the 
whole could then be assessed by combining into a single composite score 
his scores on the various subtests comprising the battery. Similarly, an 
over-all measurement of an individual's intelligence might be derived from 
a composite of scores on subtests dealing with ability to understand verbally 
expressed ideas, ability to reason, ability to use numbers, etc. 

Many difficult problems are encountered in the combining of scores 
derived from subtests involving different scales—i.c., different units of 
measurement. One difficulty arises from the fact that the character of 
some subtest scales may be such as to cause these subtests to contribute a 
disproportionate weight to the composite. If, for example, a score (number 
right) on a 10-item problem test were added to a score on a 100-item truc- 
false test, it would seem reasonable to expect that the latter score would 
be represented in the resulting composite to a far greater degree than the 
former. Although it is true that this result is to be expected, the fact re- 
mains that the number of items in itself is not a factor which determines 
the contribution of a subtest score to a composite. Consider, for example, 
a set of composite scores formed by adding the scores on a 10-item problem 
test, with the scores ranging from 0 to 10, to the scores on a 100-item true~ 
false test on which every pupil made the same score so that the range is 
zero. Clearly, the differentiation among the pupils is entirely due to the 
scores derived from the 10-item test. While this is a trivial example it does 


TABLE 7.7 


Test T Tesr H 
Ж 200- Пет 
Means and Standard De- р ad True False 
viations of Hypothetical маа Test 
Score Distributions for Metis 15 100 — 
Two Tests Together. with Standard i 
Raw Scores Made by Two Deviations 6 3 e 
Individuals A's Scores 21 97 118 
B’s Scores 9 103 112 


172 


STANDARD SCORES 


suggest that one factor contributing to the weight of a test in a composite 
is the variability of its score distribution. 

As a further illustration, consider а 30-item problem test and a 200- 
item true false test. Suppose that for the group involved the means for 
these tests are 15 and 100 respectively and that the standard deviations 
are 6 and 3 (see Table 7.7). Suppose further that Individual A in this 
group makes scores one standard deviation above the mean on the problem 
test and one standard deviation below the mean on the true-false test, 
and that Individual B makes scores the reverse of these. Table 7.7 sum- 
marizes the situation. If the two tests are to carry equal weight in the 
total or composite score then A and B ought to receive equal composite 
scores, for while their test performances are reversed, each, nevertheless, 
scored one standard deviation above the mean on one of the tests and one 
standard deviation below the mean on the other. Reference to the com- 
Posite scores given in Table 7.7 shows that A, whose better performance 
was on the 30-item performance test, receives a higher composite score 
than B whose poorer performance was on this test. Thus the problem test 
which had the more variable distribution (8 = 6) contributes more to the 
composite than the true-false test which had the less variable distribution 
(8— 3). If we now multiply (weight) each truc-false test score by the 
constant factor two, then the score distributions become equally variable 
[see (6.18)] and the composite scores for A and B take the same value.* 

The example may seem to imply that variability is the only factor 
determining the contribution of a subtest score to a composite, and that 
to assure equal contribution from all subtests it is sufficient to weight 
subtest scores so as to make all subtests equally variable—a weighting 
easily accomplished by putting all scores into standard score form. Actually 
the problem is not this simple, particularly in situations in which more than 
two subtests enter into the formation of the composite. For one thing the 
contribution of a subtest score to the composite also depends upon the 
degree of relationship (agreement) between the performances of the indi- 
Viduals on this subtest and their performances on the other subtests in- 
Volved.t For another, there may exist logical objections to using weights 
which are functions of variability alone. For example, if some of the sub- 
tests are considerably more accurate (more valid and reliable) than others, 
it would hardly seem justifiable to consider all subtests as more or less ona 
par in the establishment of a composite. Consequently, the use of standard 
Scores (i.e., weightings which lead to equally variable score dist ributions) 
to form a composite is defensible only in the case of batteries composed of 


*The respective weighted scores for A and B on the true false test are 194 (i.e., 
2x97 and 206 (i.e., 2 x 103), and their respective composite scores are now 21 4- 194 


= 215 and 9 + 206 = 215. ё ? 
TQuantitative analysis of the relationship between two sets of scores for the same in- 


dividuals is a subject of later chapters. 


STANDARD SCORES 1 7 3 


subtests which are approximately equally accurate and which lead to score 
distributions that bear about the same degree of interrelationship. These 
conditions are not quite as restricting as they may seem, for they tend to 
be reasonably well satisfied in the case of many test batteries—particularly 
aptitude and achievement test batteries. 

We shall conclude this section with a simple hypothetical numerical 
example based on the performances of ten subjects on a test battery involv- 
ing three subtests. The interrelationships among these tests differ so that 
the conditions cited above are not fully satisfied. For purposes of illustra- 
tion we shall, nevertheless, adjust (weight) the scores so as to make each 
test set equally variable, and then form a composite from the adjusted 
scores. We could accomplish this by multiplying each of the scores in a set 
by the reciprocal of their standard deviation (i.c., by 1/8). ‘The adjusted 
scores of each set would then have a standard deviation of one [see (6.18)]. 
We shall, however, take the additional step of subtracting the value Х/% 
for each set from each adjusted score of the set, thus converting to z-scorez 
[see (7.1) The raw scores for each test together with their percentile 
ranks, means, and standard deviations are shown in Table 7.8. The z-scores, 


TABLE 7.8 Scores Made by a Group of Ten Pupils on Mach of 
Three Subtests Together with Percentile Ranks, 
Means, and Standard Deviations 


Trst I Test II Test HI 
Рори, [$A $a m 
X PR X PR x, PR 
A 12 75 42 45 57 95 
B 8 55 100 95 25 35 
© 5 35 9 5 29 45 
D 2 15 18 15 18 15 
Е 15 95 61 65 42 65 
F 11 65 66 75 34 55 
G 7 45 50 55 45 75 
H 4 25 29 35 15 5 
I 1 5 20 25 22 25 
J 14 85 70 85 50 85 
x 7.9 46.5 
8 4.70 26.95 
A 


the composite scores, the percentile ranks of the composite scores, and the 
mean of each pupil's z-scores are given in Table 7.9. The percentile ranks 
of the z-values are not shown since they would, of course, be precisely the 
same as those of the X-values. i 


There are two important phenomena illustrated by these data. First, 


174 


STANDARD SCORES 


TABLE 7.9 Standard Scores and Composite Scores jor Test 
Data of Table 7.8 


Puri. 21 Bu Zin PECIA PR ore ед 
А + 0.88 — 0.17 +122 + 081 
L +0.02 + 1.98 — 0.64 4- 0.45 
C 0.62 1.30 0.35 — 0.79 
D 1.26 1.06 1.16 — 1.16 
E + 1.51 + 0.61 + 0.89 
p + 0.66 + 0.02 65 + 0.47 
G — 0.19 + 0.83 + 0.26 
H 0.83 1.38 — 0.95 
I 1:47 0.86 15 — 1.10 
J + 1.30 + 0.87 + 1.20 95 + 1.12 
Mean 0 0 0 0 
SD 1 1 1 085 | 
it should be observed that the highest ranking pupil on the basis of com- 
Posite score (Pupil J) is not highest in rank on any single one of the tests. 


In other words, under certain circumstances, the percentile rank of an in- 
dividual's composite score may be higher (or lower -see Pupil D) than the 
highest (or lowest) of the percentile ranks of the subtest scores upon which 
his composite is based. This follows (1) from the fact that while such an 
individual does not make the highest (or lowest) score on any of the sub- 
tests, his performance on each is consistently high (or low); and (2) from 
the fact that those individuals whose performances surpass (or fall below) 
his on one subtest, are different from those whose performances surpass his 
On the other subtests. For example, Pupil A whose performance surpassed 
that of Pupil J on Subtest ITI was so far inferior to J on Subtests I and II 
that his composite score was below that of J, who maintained a consistently 
high level of performance on all three tests. 

Second, it should be observed that the means of cach pupil's z-scores 
(see ¢/3 column of Table 7.9) have a standard deviation of .85, a value 
less than one. Hence, it is clear that means of z-scores are not themselves 
“Cores. This follows from the fact that means necessarily vary less than 
the individual scores which enter into their formation. 

Finally, it may be observed that occasionally it is recommended that a 


Composite be formed from the subtest P-values rather than z-scores, 
This, in general, is a practice which should be avoided. Such composite 
Scores, at best, can be no better than those formed from standard scores 
und May be-—indeed usually are much less defensible. Since PR-values 
are rank values, any information represented in differences between raw- 


175 


STANDARD SCORES 


score values is completely ignored when PR-values are used. If such differ- 
ences have any meaning at all, full account is taken of them in the z-values, 
since the differences between corresponding pairs of z- and N-values are 
proportional. Even if such differences are relatively meaningless, so that 
composite scores are useful only for ranking purposes, the ranks derived 
from composites formed of z-values will usually not be sufficiently different 
from those derived from composites formed of PR-values to warrant the 
use of the latter. 


176 


STANDARD SCORES 


THE NORMAL CURVE 


8.1 INTRODUCTION 
Engineers and scientists have long recognized the usefulness of models 
in advancing knowledge. The aircraft engineer, for example, may test a 
carefully constructed seale-model airplane in still another model simulating 
flight conditions, that is, in a wind tunnel. While the model plane may be 
expects such as shape or center of gravity, it 


like the real plane in certain r 
Will differ in many other respects such as size and weight. The model may, 
nevertheless, be quite uxeful for studying certain performance character- 
isties of its real counterpart, provided these characteristics are functions 
Only of those aspects of the real plane which are duplicated in the model. 
iswers derived from a study of the model cannot be 


Tf this is not the ease, ai 
Beneralized to the real world. 

The model plane and the wind tunnel are examples of useful physical 
models. Many of the models of most use to the scientist, however, are 
symbolic or mathematical models. Such models are useful for much the 
Same reasons that physical models are. They are easier and cheaper to 
Construct, and they have been found to work. 

" es, because the exact nature of the real thing is un- 
known, the model is constructed in conformance with, or as a replica of, 
Some theory, Insofar as the behavior of the model is shown to be in con- 
formity with what is observed to occur in the real world, confidence in the 
3 ad by the model develops. If discrepancies are ob- 


In many instane 


theory as represent 


177 


THE NORMAL CURVE 


served, the theory and, of course, its model must be altered. The work of 
Einstein resulted from such a failure of a previously accepted model. 

The models used in the study of statisties are symbolic or mathematical 
models. They are usually intended to represent some theoretical or ideal 
collection of values. This chapter is concerned with a model of this type. 

In introducing our discussion of variability (see Section 6.1), we stated 
that one of the most important applications of indexes of variability was in 
the study of errors of estimation in situations in which the true value being 
estimated is unknown. Such situations, we pointed out, always arise in 
the case of measurement of continuous attributes or in attempts to estimate 
some population fact—for example, the mean IQ of all United States 
children of ages 3 through 15—by using a sample taken from the popula- 
tion. It was suggested that the accuracy (or inaccuracy) of any such 
estimating procedure is described by the variability in a number of inde- 
pendent estimates (of the same true value) obtained by repeating this 
procedure. 

It has long been an observed fact that if the independent estimates 
differ only as a result of the operations of accidental or chance factors,* a 
frequency distribution of such estimates will tend to follow a rather definite 
pattern. If the estimating technique is free from bias, the estimates tend 
to cluster about a value which approaches the unknown true value being 
estimated. In other words, many of the estimates full relatively close to 
the true value. Occasionally, however, the vagaries of chance become more 
pronounced and the resulting estimates more deviant. Куеп gross aria 
tions may occur on rare occasions. Moreover, if there is no bias, deviations 
of a given magnitude would be found just as frequently in one direction as 
in the other. A frequency distribution of such estimates would, therefore, 
be unimodal and symmetrical, that is, bell-shaped (see Histograms С, D, 
and E of Figure 2.4). The idealized smoothed frequency polygon of such a 
distribution of estimates would have the appearance of Curves A or B of 
Figure 4.12. In this figure the 4 curve with its lesser degree of variability 


would picture the distribution of estimates for the more accurate estimating 
procedure. 


"г ч . . H + is 
The frequency distribution of errors of estimation would have precisely 
the same appearance as that of the estimat 
magnitudes of the errors differ from those of the corresponding estimates 
by a constant amount (i.c., the true value).f If the estimating procedure 
is free from bias, the distribution of errors clusters about zero, whereas the 
ue e a ec 
*For example, in measuring le 
accidentally not be placed in 
measurement might accidental] 
of the sample would almost ce 
repetition to repetition. 
+See also Section 6.8. 
{Error equals estimated v 


es themselves, since the algebraic 


ngths of objects, the zero end of the ruler or tape might 
precise alignment with one end of the object; or the 
ly be misread; or in the sampling example the composition 
rtainly be expected to be subject to chance variations from 


alue minus true value. 


178 


THE NORMAL CURVE 


distribution of estimates is centered on the true value. In all other respects 
the two distributions are identical (see Figure 6.2). 

Since it is obviously impossible in practice to measure directly the 
errors made in estimating unknown true values, it is not surprising that 
attempts have been made to provide an ideal or theoretical model of the 
expected error distribution. What is surprising is the fact that a model 
adopted for this purpose some century and a half ago* should have with- 
stood to this day numerous empirical checks against the “real world" to 
remain as опе of our most useful tools in the study of chance errors. It is 
this model which at some point—not definitely established—in its history 
came to be known as the "normal" curve. In this chapter we shall discuss 
this model together with certain of its applications. Consideration of the 
application in which we are most interested, that is, the study of sampling 
errors, will be deferred to subsequent chapters. 


8.2 THE NORMAL Curve DEFINED 


A normal curve is a graphical plot of a particular type of mathematical 
function.¢ This function produces a plot which is unimodal and sym- 
Metrical. Its value is never negative. It is necessary that a function used 
аз a model of a frequency distribution never assume a value less than zero, 
since a frequency count can never be less than zero. 

Actually there are many mathematical functions which are always 
Positive and which would produce plots that are unimodal and symmetrical. 
The normal-curve function, however, was actually derived mathemati 'ally 
on the assumption of independent repetitions of some operation which 
differ in outcome only because of the operation of accidental or chance 
factors.t This not only explains why it was chosen from among the various 
functions possible as a model error-distribution function, but also un- 
doubtedly explains why it has stood so well the test of extensive empirical 
checks against the “real world.” 

o‏ ج 
"The model was developed over two centuries ago by Abraham DeMoivre for a‏ 
different purpose. Its possibilities as a model for an error distribution were not recog-‏ 
nized until some 50 years after its original discovery. See Helen M. W alker, “Bi-‏ 
Centenary of the Normal Curve," Journal of the American Statistical Association, Vol,‏ 
(March 1934), pp. 72-75. i Y‏ 9 
Тһе dashed line in Figure 7.1 presents such a graphical plot for the function 43> rig‏ 
This mathematical expression which we represented by z is me to be a function of X,‏ 
{Ince its value depends upon—i.e., is a function of—that of X. j‏ 
S, for example, measuring the length of some object. Or, for a less practical but‏ 
Perhaps intuitively more obvious example, consider the operation of tossing from a well-‏ 
Shaken container ‘some "large" number of pennies where the outcome is the number of‏ 
heads, This to be куресй that the number of heads will be somewhere in the vicinity‏ 
9f half the number of coins in the container. On some repetitions this number will be‏ 
nearer the half expected than on others, and occasionally it may deviate rather markedly.‏ 


179 


THE NORMAL CURVE 


The particular mathematical function to which we refer is: 


РЕ (X=)? — B 
i e 
oN 2T 
where Y= the value of the function itself, i.c., the value of the ordinate 
in the graphical plot; 
X — the magnitude of an estimate or measurement of some true 
value; 
мш= Ше mean of the X's, i.e 
procedure is free from bias*; 
o =the standard deviation of the X’s*; 
Tom 3.1416, i.e, the ratio of the circumference of a circle to its 
diameter; and 
ex 2.7183, i.e., the value of the limit of a certain theoretically 
important mathematical series which is used as the base of 
the system of natural logarithms. 


y= (8.1) 


the true value if the estimating 


Y-Scale 
.0400 
-0350 + 
.0300 
-0250 
0200 
.0150 
0100 
.0050 


20 25 30 35 40 45 50 55 60 65 70 75 80 
х 
—30 —25 —20 —15 -10 —5 0 5 10 15 20 25 30 


Гаске 8.1 Plot of function of (8.1) when и = 50 and a = 10 
for selected values of X 


*We have previously used X or M to represent the mean and 8 to represent the 
standard deviation of a collection of seores. The reasons for introducing new symbols 
to represent mean and standard deviation in this model can best be explained later. It is 
sufficient to note at this point that Х or М and 8 have been in general applied to real 
collections of data. In (8.1) we are dealing with a model of a purely hypothetical or 
theoretical collection. We will later find it convenient to use different symbols for these 
two Apa ا‎ of collections and have consequently elected to introduce the symbolic 
лкы ату at this point even though the need for such variation 


180 


THE NORMAL CURVE 


Figure 8.1 shows a plot of (8.1) for a situation involving u = 50 and 
с = 10.* The function has been evaluated only for selected values of Y. 
These values are presented in Table 8.1 and are represented on the plot by 
the large dots. The complete curve was then sketched in by using these 
dots as guides. It is not essential that the student be able to verify the Y 
values given in Table 8.1. They were determined with the help of tables 
giving the values of e~! for various values of ¢. Lacking tables of et, it is 


A Y 
aS 20 10004 
TABLE 8,1 25 0018 

30 0054 

35 0130 
Values of Function of (8.1) 40 2 
when u= 50 and с = 10 for 45 : 
Selected Vaues of X 50 

55 

60 

65 

70 

75 

80 


Possible to evaluate (8.1) using logarithms. Later, however, we shall intro- 
duce tables which will make direct evaluation of (8.1) unnecessary so that 
the non-mathematical student need feel no compulsion to master the 
Dathematies necessary to such evaluation. 
There are two important characteristics of (8.1) which should be men- 
tioned at this point in our discussion, First, it can be mathematically 
Proved that the total area under the graphic plot of (8.1)—that іх, the area 
between the curve and the V-axis—is unity. Since in graphical plots of 
lequeney distributions we are accustomed to representing frequencies by 
areas (see Sections 2.3 and 2.7), it follows that (8.1) provides a model of a 
relatire vather than an ordinary frequency distribution, since in any relative 
frequency distribution the sum of the relative frequencies is unity (see 
Section 3.7), = 
Second, (8.1) is a continuous curve. This means that for any value 
оп à continuous (unbroken) Y-scale there isa value of the function (8.1). 
‘his implies that our estimates (i.e. the Хз) ЙА; ашыра һе capable 
of taking any possible value on the continuous X-scule. We have previously 
noted that man is incapable of distinguishing between adjacent points on a 


“Note only the seale labeled X at this point. The scale labeled x will be discussed later, 


‘€, equal to the area of a square with sides of unit length, or one square unit. 


THE NORMAL CURVE 1 81 


continuous scale. Also there is an infinity of different values (points) be- 
tween any two non-adjacent values (points) on a continuous seale. To 
obtain estimates corresponding to all possible values, then, would imply 
an infinity of estimates, an obviously impossible achievement in the real 
world. 

It follows, then, that no real collection of real estimates can be truly 
normally distributed, that is, truly represented by the model of (8.1). 
Actually, our primary interest in (8.1) will be as a model of a relative fre- 
quency distribution of a purely hypothetical or ideal collection of an infinity 
of estimates of the same true value, among which estimates are values cor- 
responding to all possible points on a continuous seale. While such an ideal 
collection may never be achieved in the real world, the fact remains that 
relative frequency distributions of very large real collections of real esti- 
mates have been found to fit very closely this ideal model—-cloxely enough, 
at least, to demonstrate its practical usefulness. 

We have repeatedly pointed out that subtracting a constant from cach 
score in a given distribution does not affect the variability or the form of 
the distribution but only lowers the mean by an amount equal to that of 
the constant subtracted. If, instead of a model of a relative frequency 
distribution of estimates, we wish a model of the relative frequency distri- 
bution of errors, we need only subtract the true value, и, from cach esti- 
mate, X. The resulting distribution in the case of the model distribution of 
(8.1), w= 50, е = 10, is pictured in Figure 8.1 with the error scale being 
labeled x (i.e., x= X — д). As indicated, the error distribution is like the 
X-distribution except that it is centered on zero instead of u. The normal- 
curve funetion expressed as an error distribution ean be derived directly 
from (8.1) by simply substituting x for X — д. That is, 

zi 


Y= І e оз (8.2) 
oN 2T 


a=X-—p 


where 


We have also seen that conversion or transformation of X-seores to 
?-scores results in a distribution having a mean of zero and a standard 
deviation of one unit, with the same form as the original distribution. The 
normal-distribution function in standard-score form is, therefore, given by 


a 
S 8.3) 
ign ( 
where gu. X—u 

g g 


The е (8.3) is obtained from (8.2) by letting z= г/с and remember- 
dee at, for any collection of z-scores, с = 1. A lower-case y has been use 
o represent the value of the function to call attention to the fact that Y 


182 


THE NORMAL CURVP 


w 


S 


z 


-3 2 =| 0 +1 +2 +3 


Figure 8.2 Values of y (8.3) for given values of z 


is expressed in terms of a unit different from Y. This follows simply as a 
result of the fact that z is expressed in terms of a different unit from X or x. 
The function (8.3) is pictured in Figure 8.2. The y values plotted as guide 


Doints are shown in Table 8.2. The change in the z- and y-scales is of such 


— 3.0 

T — 2.5 

TABLE & 2 290 

; — 1.5 

V —1.0 

alues of y (8.3) for Selected 0.5 
Values of z 0 


Character as to maintain a single unit of area under the curve. Hence the 
function (8.3) may be used to serve аза model of a relative frequency dis- 
tribution of measures expressed in standard-score (z) form. It is important 
to recall from the preceding chapter that z-scale values may be interpreted 
as indicating distances from the mean in units of standard deviation. 


183 


THRE 
HE NORMAL CURVE 


8.3 Some PROPERTIES OF THE NORMAL CURVE 


We have already noted (1) that the normal-distribution function is 
positive for all values of №; (2) that its greatest or maximum value occurs 
when X= ш; (3) that it is a continuous function; (4) that its graph is 
symmetrical and bell-shaped; and (5) that the area between its graph and 
the X-axis is unity. 

There are several other characteristics of this function that should also 
be considered. First, the actual range of this model distribution is from 
= = {0 + 2. Inspection of Figure 8.1 shows, however, that for values of 
X deviating from и by more than 3 standard deviations (o's) the value of 
the function (Y) is very near to zero. In faet, it can be shown that the area 
under the two parts of the curve that lie more than 30s away from и is 
only .26 per cent (ie, 26 ten-thousandths) of the total area under the 
curve. Since this is a rather negligible portion, it is common practice to 
regard the practical range of this distribution function as Go's (L6, 85 
extending from 3 o's below и to 3 o's above д). 

Second, the points of inflection on the curve oceur one standard devia- 
tion on either side of д. A point of inflection on a curve separates ares 
which bend in opposite directions. Pretend, for example, that the normal 
curve pietured in Figure 8.3 shows the cross-section of a hill or mound 


=й —2 =l 0 +1 + +3 
z-Scale 


Ficure 8.3. Normal curve of (8.3) showing points 
of inflection A and B 


which you are to climb, proceeding from left to right. As you climb, the 
slope becomes Increasingly steeper up to a point (А) after which the steep” 
ness decreases until you reach the top. F е 
or were you to descend by the 
(B) at which the slope would 
ness. These two points 
of the normal curve a 
center (ш). 


Third, it should be noted that (8.1) and (8 


Tad you climbed the opposite sic 
opposite side, you would also arrive at a point 
change from increasing to decreasing in steep- 
(A and B) are points of inflection, and in the case 
re located one standard deviation to either side of the 


2) represent. many different 


184 


THE NORMAL CURY? 


curves, each of which is а normal curve. In other words and speaking 
strictly, there is no such thing as fhe normal curve but rather there i isa 
family or class of curves each of which is a normal curve. The members of 
the family differ with respect to wand ø if (8.1) is used or only with respect 
to o if (8.2) is used. Variation in д does not affect the appearance of the 
curve but simply determines its central location on the scale. Variation in 
7, however, has considerable effect upon the : appearance of the curve, mak- 
ing it broad (spread out) or narrow (compact). This characteristic of the 
function is essential if it is to be successful as a model of distributions of 
estimates or errors. Clearly, depending upon the accuracy of the estimating 
Procedure, the estimates or errors will or will not differ markedly. By vary- 
ing the size of & we can make our model represent the product of either 
Curate or inaccurate estimating procedures. Figure 8.4 shows three nor- 
mal curves of the type (8.2) superimposed on the same axes. These curves 
аге all centered on zero and are of unit area but have standard deviations 


—30 —25 —20 -15 —10 —5 0 +5 +10 +15 +20 +25 +30 


Figure S.4 Three normal curves with p= 0 and unit area but 
with varying values of т 


THE NORMAL CURVE 185 


of 2.5, 5, and 10. Considering the marked variation in the appearance of 
these three normal curves, it should be obvious that it would be extremely 
difficult to tell by visual inspection alone whether or not a given unimodal 
symmetrical curve satisfied the conditions of (8.1) or (8.2), that is, was a 
normal curve. This difficulty is further aggravated by the fact that the 
appearance of any curve may be manipulated to a degree by the choice of 
the physical distances representing units along the X- and Y-scales (see 
Section 2.10—note particularly Figures 2.14 and 2.15). It is not surprising 
then to learn that certain real collections of measures which were more or 
less bell-shaped have been mistakenly described as normally distributed. 
Fourth, the height (У) of any normal curve at a point deviating from 4 
by some specified o-distance is always the same percentage (i.e., proportion) 
of the center height.* The values of these percentages for a few selected 
standard-deviation distances from the center are given in Table 8.3. These 
percentages make it relatively simple for the non-mathematical student to 
sketch a curve which will satisfy the specifications of a normal curve. It is 


SER 


c-DisTANCE T 
Per CENT 


TABLE 8 . 3 ABOVE ш 


ae 
0.0 100.0 

0.5 88.3 

1.0 
1.5 
2.0 
2.5 
3.0 


Percentage Y Is of Center Height 
for Selected -Distances Above pt 


necessary only to select some center height and then, at the various 6- 
distances from center, locate guide points which are the stated percentages 
of this center height above the X-axis. The curve may then be drawn 
through these guide points. An illustrative sketch is shown in Figure 8.5. 
It should be obvious to the student that, by varying the physical distances 
representing the center height and one standard deviation, it is possible {0 
make normal curves which range in appearance from very flat and broad to 
very peaked and narrow. Yet, so long as the specifications of Table 8.3 are 
followed, the resulting curves will be normal. One other point should be 


کک 


*The stude vi raining i i 

куы peed om ome training in mathematics will recognize that this must be 80 

iro icis с юп oe (8.1). The non-mathematical student need feel responsible only 
g an understanding of the point being made and should simply assume its 

mathematical accuracy, | | — € 

Since the curve is symmetric: 

V paca, \ letrical, the percentages need be given only for distances 


186 


THE NORMAL CURVE 


--—------- 88.2%--------=0 


3.00 —2.5о —2.00 —1.5с —1.00-0.50 н +0.50 +1.00 +1.50+2.00 +2.50 +3.00 


Ficure 8.5 Sketch of normal curve using a center height of 2 1/2 
inches and representing 1 o by 5/8 of an inch 


Noted with regard to normal curves thus constructed. Unless the center 
height— regardless of the physical distance selected to represent it—is 
Considered as having the value L/oV2r (i.e., 3989/0), the area under the 
curve will not be unity. This point will be treated further in a subsequent 
Section, : : 

The fifth and final characteristic to be considered in this section is 
definitely the most important of all. Without any attempt to provide 
Mathematical bases, we refer to the fact that in any normal curve the per- 
centage (fraction or proportion) of the area between и and a point deviat- 
g-distance, is always the same. The values of 


Ng from it by some specified ; 
(dert dun selected o-distances are shown in Table 8.4, 


these percentages for a few 


т -DisTANCE 
TABLE 8.4 т ын " Per CENT 
ee d aad QE, 
0.0 00.00 
Percentages of Area Between 05 19.15 
enter and Points Selected c- 1.0 34.13 
Distances Above u* 1.5 43.32 
2.0 47.72 
2.5 49.38 
3.0 49.87 
I RR | | 
*Since the curve is symmetrical, the percentages need be given only for distances 


above д. 


187 


THE NORMAL CURVE 


Figure 8.6 shows the area between the center (0) of a normal curve of type 
(8.3) and z= 1, that is, a point 1 с above center. This area is 34.13 per 
cent of the total area under the curve (see Table 8.4), and this fact is true 
of any normal curve regardless of the values of и and с. 


=3 -2 ed 0 +1 +2 +3 
z-Scale 


Figure 8.6 Normal curve of (8.3) showing area 
between 0 and 1 


That this characteristic is of utmost importance follows from the fact 
that, in the graphical plot of a frequency distribution, frequencies are 
ordinarily represented by areas. Since our interest in the normal curve is 
as a model of an ideal frequency distribution, information regarding the 
normal-curve areas over certain segments of the score scale represents 
information regarding the relative frequencies with which scores fall into 
these segments in the ideal distribution. Such information makes it possible 
to provide a rather complete description of any ideal normal-frequency 
distribution. 

Suppose, for example, that we wish to know for such a distribution 
the percentile ranks (PR's) of the score points 3.00 + u, 2.50 + M 
2.00 + u, °° :, — 3.00 + д. From Table 8.4 we see that the PR of 3.00 + H 
is 49.87 + 50.00 = 99.87. This follows from the fact that 49.87 per cent of 
the area under any normal curve falls between u and +3.00. Since 50 
per cent necessarily falls below и, the total percentage falling below 3.0 7 
above ш is 99.87. Other PR's for o-distances above ш can similarly be 
determined simply by adding 50 per cent to the appropriate values in 
Table 8.4. To determine the PR's for o-distances below u, we must sub- 
tract the appropriate values in Table 8.4 from 50 per cent. Thus the PR of 
= 1.00 + ми (i.c., 1 с below ш) is 50.00 — 34.13 = 15.87; for, since 50 per 
cent of the area under any normal curve falls below mand 34.13 per cent 
between. u and either 1 с above or below it, it follows that for any normal 
curve the percentage of the area below — 1.00 + wis given by the difference 
between 50.00 and 34.13. The percentile ranks for all the score points in 
question are given in Table 8.5. To emphasize the fact that these PR’S 


188 


THE NORMAL CURVE 


hold for any normal curve, the score scales for specific curves of the types 
(8.1) and (8.2) as well as the scale for (8.3) are all shown in this table. 


TABLE 8.5 PR's for Selected Score Points of Normal 
Distributions 
[ Tes (8.1) Type (8.2) Type (8.3) Any 
K=50,0= 10 | u20,0—10| p=0,7=1 NORMAL PR 

X x 2 DISTRIBUTION 
80 30 3.0 3.00 + ш 99.87 
75 25 2.5 2.50 + u 99.38 
70 20 2.0 2.00 + u 97 
65 5 1.5 1.50 + u 93.32 
60 10 1.0 1.00 + ш $4.13 
55 5 0.5 0.50 + u 69.15 
50 0 0 50.00 
45 — 5 — 0.5 
40 — 10 — 10 
35 — 15 —18 
30 — 20 — 2.0 
25 — 25 — 2.5 

|..598 — 30 — 3.0 | 


Or, again using the area facts of Table 8.4, we see that in any ideal 
normal distribution 99.74 per cent (i.e., 49.87 + 49.87) of the scores fall 
between the points 3 o below and 3 o above u. This is consistent with the 
statement made at the outset of this section that only .26 per cent of the 
scores differ from и by 3 с or more. Similarly, the facts of Table 8.4 show 
that 95.44 per cent’ (i.e, 47.72 + 47.72) of the scores fall within 2 of u, 
and that 68.26 per cent (i.e, 34.13 + 34.13) fall within То of u. This last 
Cited fact is the basis for the rough generalization that two-thirds (66.67 
Der cent) of the scores of a normal distribution fall within То of д. 

One set of facts about normal-curve areas which we will subsequently 


Co DEVIATING g-DisTANCE 
TABLE 8.6 Monk (2) 
20 1.28 
9-Dislances from ш Exceeded bya m e 
з a . Өе 
Stated Percen tage of the Score Devi- 25 294 
ations 2 233 
1 2.58 
0.5 2.81 
0.1 3.20 


THE NORMAL CURVE 


189 


find particularly useful are the deviations from ш in g-units which are 
exceeded by selected percentages of the score deviations in an ideal normal- 
score distribution. These g-distances are given in Table 8.6. Figure 8.7 


—1.96 z-Scale +1.96 


FiaunE 8.7 Normal curve of (8.3) showing 5 per cent of 
area as 1.96 т or more away from center (u = 0) 


shows, in the case of 5 per cent, how a deviation of 1.96 о is exceeded by 
2.5 per cent of the deviations of scores falling at the upper end of the ideal 
distribution and by another 2.5 per cent of the deviations of scores falling 
at the lower end. 


8.4 TABLES or ORDINATES AND AREAS FOR THE NormaL Curve (8.3) 


We have seen that for any normal curve there exists a set relationship 
between the height (Y) at a specified -distance from ш and the height at 
u. We have also seen that for any normal curve the area between p and a 
point at a specified o-distance from д is the same proportion of the total 
area. Obviously then, if we know these height relationships and area pro- 


TABLE 8.7 Normal Curve Areas and Ordinates 


Cor. 2 


Cor. 1 Сог. 3 | Cor. 4 Cor. 5 Cor. 6 | Cor. 7 | Cor. 8 
Proportion | Proporti 5 зеб 
d portion Р yasa % PR PR -— 
иог |Beyond+z y of yatu | of +z] of —z 
mem 
0.0 .0000 1.0000 | .3989 | 100.00 | 50.00 | 50.00 0.0 
+0.5 1915 :6170 | 3521 88.25 | 69.15 | 30.85 | — 0.5 
+10 3413 3174 | 2420 60.65 | 84.13 | 15.87 | — 1.0 
+15 | 4332 1336 | 1295 3247 | 9332 | 6.68 | — 1.5 
+20 4772 0456 | .0540 13.53 | 97.72 | 228 | — 2.0 
+25 | 4988 0124 | 0175 439 | 99.38 | 0.62 | —25 
+ 30 4987 0026 | .0044 1.11 99.87 | 013 | —3.0 


190 


THE NORMAL CURVE 


portions for one normal curve, we know them for all normal curves. We 
shall present these facts for the normal curve in standard-score form, that 
is. for the function (8.3).* As we shall subsequently illustrate, it is a simple 
matter to apply these facts to any normal curve by transforming from one 
scale (г) to another (X or а). These facts together with other useful 
information regarding (8.3) are presented for selected z-values in Table 8.7. 
In this table: 


Column 1 gives selected z-values from the upper half of the curve (8.3). 

Column 2 gives the proportion of the total area between Mand z. 

Column 3 gives the proportion of the total area falling in the two ex- 
tremes or ends} beyond the given z-distance from и. 

Column 4 gives the ordinates for the curve (8.3) corresponding to the 
given z. 

Column 5 expresses the height of the curve at z as a per cent of the 
center height. 

Column 6 gives the percentile ranks of values of z from the upper half 
of the curve. 

Column 7 gives the percentile ranks of values of г from the lower half 
of the curve. 

Column 8 gives sclected z-values from the lower half of the curve. 

There are several points which should be made rega rding this collection 

of facts about the normal-distribution function. 


1. The z-values may be interpreted as o-distances. 
2. The z-value for any X may be found by: 


zat where x= X— и (8.4) 
c 
3. The Y-value for any z may be found by: 
N=oz+u (8.5) 


4. When z-values are interpreted as o-distances as given by (8.4), 
Columns 2, 3, 5, 6, and 7 apply to any normal distribution function. 
5. The ordinates (y) given in Column 4 apply only to the curve in 
standard-score form (8.3). Ordinates (Y) for any normal curve may 


be obtained by dividing the y-values by o. That is, 


y= (8.6) 
g 


6. Columns 2, 4, and 5 apply for plus or minus z-values, that is, for 


2-values either above or below H. 
7. Column 6 applies only to positive z-v alues. 
8. Column 7 applies only to negative z-values. 


a 


ey . Р 
This Particular curve is sometimes referred to as the unit or standard normal curve, 
Such end Pieces are often referred to as tails. 


THE NORMAL CURVE 191 


To illustrate the application of the facts of Table 8.7 to a normal dis- 
tribution function of the type (8.1) we shall consider the case in which 
p= 30 and o = 5 
mate model for a hypothetical distribution of 5,000 test scores. 

1. What per cent of the area lies. between 25 and 302 For Х = 25; 


Assume further that this curve is serving ах an approxi- 


z= (25—30)/5=— 1. Column 2 shows the required percentage to be 
34.13. Since area represents frequency, this implies that approximately 
1,706.5 (i.e., 34.13 per cent of 5,000) of the scores in the hypothet ial score 
distribution are between 25 and 30. 

2. What per cent of the area deviates from 30 by more than 52 That is. 
what per cent of the area is above 33 and below 25? For a= cs 
z= 45,5 — 4 1, and Column З shows the required percentage to be 31.74. 
Again, since area represents frequency, it follows that approximately 1,587 
(31.74 per cent of 5,000) of the scores in the hypothetical score distribution 
are cither greater than 35 or less than 25. 


3. What is the height of the ordinate (Y ) at 25? Again, z —— 1. Applying 
formula (8.6) to the y-value read from Column 4, we have Y = .2420,5 
= .0484. 

4. What per cent of the center height is the height at X= 27? Column 5 
shows that for 2 = — 1 this percentage is 60.65. 


5. What is the PR of 35? Here, z=+ 1, and Column (у shows the re- 
quired PR to be 84.13. That is, approximately 4,206.5 (84.13 per cent of 
5,000) of the scores in the hypothetical distribution are less than 35. 

û. What is the PR of 25? Here, z 2 — 1, and Column 7 shows the re- 
quired PR to be 15.87, which means that. 793.5 of the secres in the hypo- 
thetical score distribution involved are less than 25 

т. What approximately is the value of P» in this distribution? Since 2 is 
less than 50, we know that. P» is a point in the lower half of the distribution 
Referring to the table, we find that 2.28 per cent is the closest value to 2 
per cent in Column 7. Since 2.28 corresponds to z = — 2, we obtain, upon 
applying (8.5), P» = 20. I.e., P» = (5)(— 2) + 30 = 20. 

8. What approximately is the distance such that 10 per cent of the scores 
in this distribution deviate from 30 by more than this distance? Referring to 
Column 3, we see that 13.36 per cent is the closest value to 10 per cont in 
this table. This corresponds to a z- or o-distance of 1.5 which is the equiva- 
lent of 5 X 1.5 = 7.5 score units. Hence, the distance required is 7.5. 

Clearly, the answers to the last two questions can only be roughly ар" 
proximated from the information given in Table 8.6. The same would be 
true were we to ask, for example, for the PR of a score of 36. Here z = + 1.2. 
a value not included in the table. The best we could do would be to use the 
PR for 35 (i.e., for z = + 1.0) or perhaps make some interpolation between 


жт ни са i 

e qud us сап only approximate such a distribution since it is continuo? 
, Sirictly speaking, can represent only an infinity of measures of a continuous an 

normally distributed trait. ` | 


192 


THE NORMAT, CURVE 


the PR for 35 and the PR for 40. Obviously, if our information about 
normal curves is to be at all precise, we shall need a table which is much 
more complete than Table 8.7. This is provided in Table II, Appendix C, 
рр. 502-509. This table is identical with Table 8.7 in design and may be 
used in precisely the same manner. Except at the extremes, it provides the 
facts for z-values differing by .01. 


8.5 PROBABILITY 


The words probable and probability—at least in a loose sense—are com- 
mon to the vocabularies of even clementary school children.* They are, 
of course, used to refer to the likelihood of occurrence of uncertain events. 
The phrases “probable showers” and the “probability of (winning a basket- 
ball) Victory” are illustrative of such common usage. 

A more precise notion of probability is fundamental to certain very 
important aspects of statistics. We shall make frequent use of the concept 
in subsequent chapters. In spite of its importance in statistics, the concept 
of probability is difficult to define satisfactorily. Several quite different 
approaches are to be found in the mathematical literature on the subject. 
We shall make no attempt at rigor. Instead, we shall try to provide the 
student with an admittedly oversimplified set of notions which it is hoped 
Will be sufficient, nevertheless, to enable him to develop some appreciation 
of the role of probability in statistics. 

To begin, consider a collection or universe of objects. We shall desig- 
Now suppose that the objects comprising U are of 


nate this universe as C. c e 
s of objects be called 


several different kinds. Let one of these kinds or class 
w. "Then the probability of an object of type w in the universe of objects, 
U, is by definition the relative frequency (expressed as either a common or 
decimal fraction) with which type w objects occur in this universe, 

For example, suppose the objects of the universe are the individual 
cards comprising an ordinary 52 card deck of playing cards. Then the 
probability of a spade in this universe is one-fourth or .25, since 13 of the 


92 cards involved are spades. И Su d " 
If we let f, represent the number of type w objects in U, and N the 
total number of objects in U, then by definition the probability of a w in 


( is // М. This statement may be expressed symbolically as follows: 


" fw 5 
P(w | t= N (8.7) 

MN RENE‏ ج 
and and probability in the fifth thousand of the‏ 


*Probal all i » third thous: à rizî an 
Hi fy ыкы ы Т, d L. Thorndike and Irving Lorge, The Teacher's Word 


Jook of 30,000 Words, Bureau of Publications, Teachers College, Columbia University, 
New York, 1944.) inted 
The role here referred to will only be hinted. 
° directly undertaken in subsequent ch pters. | 
Read: “Phe probability of a w in universe [a 


at in this chapter. Its introduction will 


THE 1 9 3 


NORMAL CURVE 


Applying this notation to our playing card example, we would write 


| E „у= 13 = Е = 95 
P(spade | ordinary deck) = Е ауа .25 
ЕЕЕ 4 1 
D ( ]-4- ^ a ——— £z ved 
or P(king | ordinary deck) = 527137 .077 


It is important to specify the universe involved. For example, simply 
to speak of the probability of a king as one-thirteenth is not generally true, 
since 


1 

ki 7 в) e 

P(king | pinochle deck*) 1876 

Of course, if the universe is clearly specified and is the only universe 

in question so that no possible misunderstanding can arise, the notation of 
(8.7) may be abridged by writing simply 


P(w) = Le (8.72) 


It should be clear that any universe which involves objects of type w 
must necessarily involve objects which are not of type w, that іх, nw-type 
objects. Let the number of nw-type objects in U be represented. by nw 
Then applying (8.7), we may write 


P(nw | О) = ول‎ 
But, 


Sw + Saw = N 
Hence, it follows that 


P(w | U) + P(nw | U)=1 (8.8) 
For „у Sow + fue N] 
Noe N= INS 


: Of course, a U may actually consist of a number of different types of 
objects, say w-type, x-type, y-type, and z-type. Then, if fy, fz, fy, and А 
represent the numbers of each of these types in U, we may write 


P(w|U)— E 
P(r 0) =k 
Ру| U) = de 


P(e | U) = 


*A 48-card deck involving 8 kings. 


194 


THE NORMAL CURVE 


And if there are no other types of objects in U, we may again write 


P(w| U)+ P| U)+P(y| U)+ Р( |0) =1 


fey fea fay fe fot Seth the N 
N'*NTNTN- N y5! 


since 


The above probability statements regarding objects of type w, т, y, 
and z in universe U may be presented in tabular form as in Table 8.8. 


TABLE 8,8 


Probabilities (P) of Various 
Types of Objects in U 


It is clear that Table 8.8 is nothing more than a relative frequency dis- 
tribution. When a relative frequency distribution is thus interpreted, it is 
called a probability distribution. Since any ordinary frequency distribution 
may be presented as a relative frequency distribution and any relative 
frequency distribution be interpreted as a probability distribution, it follows 
that any ordinary frequency distribution may be converted into a proba- 
bility distribution. Actually, then, we have previously presented in Section 
3.7 a scheme for representing symbolically any probability distribution. 
In this scheme the objects were scores and the universe was a collection of 
N scores. The scores were classified according to magnitude in terms of 
intervals along the score scale. The relative frequency associated with a 
given class or interval is the probability of the type of object (score) which 
belongs to (falls in) this class or interval. 1 

By way of example, consider as U the collection of 50 scores on a 25- 
Word anticipation test which are given in Table 5.1. If we classify scores 
—that is, if scores of the same size are regarded as 
distribution for the types of scores in 
Or if we type or classify these scores 


according to magnitude 
ү Particular type—the шон : 
is universe is as shown in Table 8.9. Ip El 
cording = ios fall in the intervals 21-23, 18-20, 15-17, ete., the proba- 
bility distribution for the types of scores is as shown in Table 8.10. 
.. A comparison of the probability distributions of Tables 8.9 and 8.10 
illustrates one yery important fact, namely, that the probability of scores 
of the type 9-11 as shown in Table 8.10 is the same as the sum of the 
Probabilities of scores of types 9, 10, and 11 as shown in Table 8.9 (i.e., 


195 


THE NORMAL CURVE 


SCORE PROBABILITIES 
21 1/80 07 
18 250= .04 
TABLE 8,9 17 150= .02 
16 250-2 .04 
15 3/502 .06 
Probability Distribution of Differ- 14 2/502 .04 
ent Scores in Universe of Table 5.1 13 350-2 .06 
12 4,50= .08 
11 86/502 12 
10 9/50= .18 
9 7/50 = .14 
8 5/50 = 10 
4 2/50= .04 
6 1/50= .02 
5 /50= .02 
4 50 = .02 
50/50 = 1.00 
TABLE 8.1 е CLASSES mn | 
Probability Distribution = = es 
н А 18-20 04 
for Certain Classes 15-17 12 
(Types) of Scores in U 12-14 18 
of Table 5.1 9-11 EE! 
6-8 A6 
3-5 7 04 
50/50 = 1.00 


= 14+ 18+ 12). This fact is known in probability theory ах the 
addition rule. A more formal statement of this rule follows. 


: ADDITION RenE. If in a given universe U, the probability of a type W 
object is P(w | U) and the probability of a type x object is P(x | C), then the 
probability of а new type object in U which may be called an “either w- or ve? 
type object is the sum of these separate probabilities. Or symbolically, 


P(w or «| U)= P(w] U)+ PG | C) (8.9) 

ы о BHO this rule let f, and Г. be the numbers of type w and type T 

sionis in U. Then the total number of "either w- or =" type objects is 

clearly fet fa and if N is the total number of objects in U, it follow? 
directly from the definition of probability (8.7) that | 


196 


THE NORMAL CURVE 


P(w or | U = (7. +7), N 


M عل‎ 
ni A 


= Pw] C)+ P| U) 


It should also be obvious that the addition rule may be extended to more 
than two probabilities, that is, to apply to new objects of the type “w orx 
Or 2, ++.” Finally, it is extremely important to note that the addition rule 
applies only to objects in a particular C, that is, to a given probability 
distribution. The rule is not applicable to a new type “w or x” object 
formed from w-type objects in one U and x-type objects in a different U. 


8.6 Tur Concert or PROBABILITY AS APPLIED TO THE 
OUTCOME OF AN UNCERTAIN EVENT 


In beginning the foregoing section, we remarked that in common 
usage the word probability is employed to refer to the likelihood that au 
event of uncertain outeome will occur in a particular way. Thus it may be 
said that on a given occasion the probability of rain is high (large), or the 
probability of a bumper crop is small. In this section we shall examine the 
sense in which such statements may be viewed as probability statements. 

Interpreted literally such statements refer to a single event and hence 
cannot be probability statements since the concept of probability refers to 
the relative frequency of a particular type of event (object) in a universe 
of a number of events some of which differ from others. Actually, however, 
such statements are not intended to apply to a single event and, naive as 
his knowledge of probability may be, the maker of them is usually cognizant 
of this fact. He recognizes, for example, that on the given occasion to which 
his statement about the probability of rain applies, it either will or will not 
rain, Actually he is reporting on his past experience with repeated sets of 
like cireumstanecs-— such as a falling barometer, a given wind direction, a 
given bank of clouds, and so on. He is in effect saying that he has observed 
s appeared to be the same 


à number of occasions in which the circumstance 
as those now confronting him, and that on a certain fraction of those ocea- 
He is extrapolating his past experience to the future in that 
he is assuming that as the particular set of cireumstances continues to arise 
it will rain about the same fraction of the time as in the past. Though he 
May never have formalized his thinking about his particular probability 
Statement, he is intuitively citing the relative frequency of the occurrence 
Of rain in a universe of occasions each involving the present set of cireum- 
stances. This universe is partly real and partly hypothetical. It is partly 
real in that some of the elements (occasions) involve actual (real) past 
xperience. It is partly hypothetical in that some of the elements are 
assumed future oceasions. But the probability statement, nevertheless 


SIONS it rained. 


197 


THE NORMAL CURVE 


has to do with the relative frequency of a certain particular type of event 
(rain) in a certain universe of events. To the extent that this relative 
frequency is large or small he states that the probability of rain is large or 
small. 

By way of further illustration we shall similarly analyze the statement 
that the probability of drawing a spade from a well-shuffled ordinary deck 
is one-fourth. Again this statement does not apply to a single draw. It 
applies to a universe of such draws. The universe here is entirely hypo- 
thetical and is presumed to embody the theoretical totality of experience 
with this event. Its elements are the spades and non-spades arising from 
hypothetically repeating the drawing process an infinity of times. The 
statement that in this universe one-fourth of the elements are spades is 
based on the assumption that the fraction of spades in this hypothetical 
universe will be the same as the fraction of spades in the original deck. 
Obviously the probability statement is valid only to the extent that this 
assumption is valid. 

Most of the probability distributions (universes) used in statistics, like 
that of the foregoing example, are hypothetical in character. They have 
been empirically checked, however, by the device of analyzing the out- 
comes of large numbers of repetitions of the event to which they apply- 
Such cheeks have demonstrated that so long as he limits his use of such 
distributions to the situation for which they were intended, the student 
need have no question about their validity in the real world. 


ы». ter " 
8.7 Tue NORMAL Curve as A PROBABILITY-DISTRIBUTION MODEL 


Since the normal curve may be used as a model of a relative frequency 
distribution, and since any relative frequency distribution may be viewed 
as a probability distribution, it obviously follows that the normal curve 
may be used as a model of a probability distribution. 

For example, we may consider the normal curve of Figure 8.1 with 
u= 50 and ¢ = 10 as a model of a universe of scores or measures of some 
continuous trait. We may now ask, "What is the probability of scores in 
this universe (NPD*: = 50, о = 10) deviating from 50 by 10 or less?” 
The type of object referred to in this question is a score (X), the value of 
which is between 40 and 60 (ie, 40 - X = 60). The relative f requency 
with which this type of object occurs in this universe may be read from 
Table 8.7. Since 10 is one standard deviation (i.e., since z= 1), Column 2 
of Table 8.7 shows the relative frequency of an X between 50 and 60 to be 
34.13 per cent or .3413. Since the curve is symmetrical, we know that the 


relative frequency of an X between i i 
fr 50 and 40 is also .3413. lying 
the addition rule of Section 8.5, we obtain жишш 


*Read: “normal probability distribution.” 


198 


THE NORMAL CURVE 


е سے‎ 


Р(40 = X = 60| NPD : p= 50, o = 10) 
= P(40 = X = 50 | NPD : u= 50, o = 10) 
+ P(50 = X < 60 | NPD : p= 50, 0 = 10) 
= 3413 + -3413 
= .6826 


Or consider the question, "What is the probability of drawing at ran- 
dom* a score having a value between 40 and 60 from a normally distributed 
universe with д = 50 and g = 10?” As we have seen in the foregoing sec- 
tion, this question actually asks, ‘What is the probability of an X between 
40 and 60 in the hypothetical universe generated by an infinity of repeti- 
tions of the drawing process?” We shall assume that an infinity of repeti- 
tions of this drawing process would generate a universe which is like the 
universe from which the draws are made. Then the probability called for 
is the relative frequency of an X between 40 and 60 in the original universe. 
As we have shown above, this probability is .6826. 


8.8 Exaupies SHOWING How Tastes May Ве Usep To OBTAIN 
Various Facts ABOUT NORMAL DISTRIBUTIONS OR 
NORMAL PROBABILITY DISTRIBUTIONS 


In this section we shall illustrate specifically the use of Table II of 
Appendix C by presenting solutions to selected types of problems. Some 
of the examples used call for the determination of the probability of draw- 
ing a certain type of score at random from a normally distributed universe 
Of scores. All such problems should be interpreted as referring to the 
Probability of the type of score in question in the hypothetical universe 
generated by an infinity of repetitions of the drawing process. We shall, 
Moreover, assume that the hypothetical universe thus generated is identical 
With the original universe from which the draws are made. 


Type 1. Area to опе side of a given score point. 


What is the probability that a score selected at random 


Example 8.1. Я ‘ 
scores With mean 30 and standard 


from a normally distributed universe of 
deviation 4 is 25 or more? Р ОИ" 

Solution. Here we need simply to determine the fraction of the area 
of this ND which is above 25 (see Figure 8.8). Since z= (25 — 30)/4 
= — 1.25, the fraction required is the same a$ that of the area of the unit 
ND (i.e, ND :p=0,0 =1) which lies above — 1.25. Column 2 of Table II 


*Ву a random draw, we refer to some operation of selection which guarantees to each 
ж , 


object of the universe an equal chance of being selected. 


199 


THE NORMAL CURVE 


4 


Fictre 8.8 Normal distribution with u = 30, с = 4 


42 


shows the fraction of the area between u = 0 and z=— 1.25 to be 3944. 
To this must be added 0.5000 which lies above u = 0. Hence, we have 


P(X = 25 | ND : p= 30, 0 = 4) = 8944 


Comment. It should be observed that the information necessary to 
the solution of this problem may actually be read from Table IT in a variety 
of ways. For example, Column 7 shows the PR of z2 — 1.25 is 10.56. 
Hence, the required fraction is given by 1.0000 — .1056 = 89H. Or, since 
the curve is symmetrical the percentage above any negative z 1х the same 
as the PR of the corresponding positive z, so that the required fraction may 
actually be read directly from Column 6.* In the following examples only 
one method of solution will be indicated. The student should, however, 
consider various alternative ways in which Table II provides a given item 
of information about a normal distribution. In this way he will not only 
fully familiarize himself with the character of the information contained in 
table but also can learn the most efficient ways in which to use the 
table. 


Example 8.2. Given a ND collection of IQ scores with u = 100 and 
giex 16, W hat is the PR of a score of 128 in this collection? . 
| Solution. Here z= (128 — 100)/16 = + 1.75 and from Column 6 of 
Table П we read directly that 
PR(z=+ 1.75) = PR(IQ = 128) = 95.99 
Type 2. Area between two score points. 


Example 8.3. Given a ND collection of measures with p = 10 and 


o=6. Tf a score is selected at random from this universe, what is the 
probability that its value is between 36 and 48? 


*The values given in С ; ; 
Phe values given in Column 6 are percentages. Since probabilities are not tradi- 


tionally expressed as percentages ‚у: in Û aci- 
anal ection its, z De s, the value 89.44 in Column 6 should be read as a dee! 


200 


THE NORMAL CURVË 


Solution. Here we seek the fraction of the area of this ND which lies 
between 36 and 48 (see Figure 8.9). Since 
гу = (36 — 4101/6 = — .67 and г» = (48 — 10) 6 = + 1.33, 


we see from Column 2 of Table II that the fraction of the area between 
36 and 40 is 24806, while the fraction between 40 and 48 is .4082. Hence, 


DP(36 X = 40| ND: p= 40, со = 6) = .2186 + 4082 = .6568. 


Figure 8.9 Normal distribution with u = 40,0 = 6 


Example 8.4. In the distribution of the foregoing example, what is 
the probability of a score selected at random having a value between 27 
and 35? 

Solution. Here we seck the fraction of the area between 27 and 35. 
One way to obtain this percentage from Table II is to note from Column 2 
that since zı = (27 — 40)/6 = — 2.17, the fraction of the area between 27 
and 40 is .4850; and that since 22 = (35 — 40)/6 = — 0.83, the fraction of 
the area between 35 and 40 is .2967. Hence, the fraction of the area between 
27 and 35 is .4850 — .2967 = .1883 (see Figure 8. 10). Therefore, 


Р(27 = Х = 35 | ND : p= 40, с = 6) = .1883 


22 


8.10 Normal distribution with p= 40,0 = 6, 


FIGURE 


201 


THE NORMAL CURVE 


Example 8.5. What is the relative frequency of an X of 34 in a VD 
universe of X's for which w= 50 and o = 10? 

Solution. Whenever we use the normal curve as a model of a relative 
frequency distribution, the relative frequencies are represented by arcas. 
But there can be no segment of area above a score point—the score point 
having no width. However, in reporting measurements of continuous at- 
tributes we usually report to the nearest unit point. Thus a score of 34 
reported for a particular object implies that we have determined the "true" 
amount of this attribute as possessed by this object to be somewhere 
between 33.5 and 34.5. That is to say, in terms of this X-seale any object 
which is measured as possessing between 33.5 and 34.5 units of the trait 
in question is reported as possessing 34 units. Hence, the relative frequency 
of 34 is represented in the normal-curve model by the proportion of the 
area between 33.5 and 34.5 (see Figure 8.11). The solution of this example 


Frank 8.11 Section of N-scale 
for the normal distribution with 
u= 50, с = 10 


32 33 34 35 36 


now follows precisely that sketched for the foregoing example. From 
Column 2 of Table П we note that since 21 = (33.5 — 50),/10 = — 1.65. 
the proportion of the area between д = 50 and 33.5 is 4505; and that since 
гэ = (34.5 — 50)/10 = — 1.55, the proportion of the area between u and 
34.5 is 4394. Hence, the proportion of the area—the required relative 
frequency—between 33.5 and 34.5 is 4505 — 4394 = .0111. 


Ficure 8.12 Normal distribution with и= 60, п= 8$ 


202 


THE NORMAL CURVE 


Type 3. Area beyond (above and below) two score points. 


Eeample 8.6. If a score is selected at random from a ND universe 
with u = 60 and т = 8, what is the probability that its value differs from 
the mean (60) by 10 or more points? 

Solution. Ne seek the fraction of the area of this ND which is below 
50 and above 70 (see Figure 8.12). Here 10 score points correspond to a 


9- or z-distance of 10 8 = 1.25. From Column 3 of Table II we read 
directly that the fraction of the area beyond z = + 1.25 is 2113. Hence, 


Р(Х = 50 and X 2 70| ND : w= 00,0 = 8) = .2113. 


Түрк 4. Score point corresponding lo a given arca on one side. 


Example 8.7. What is the value of D» in a ND collection of X’s with 
K= 25 and o = 5? 
Solution. Here we seek a value on the X-seale such that 20 per cent of 


the area of the given VD lies below it (see Figure 8.13). Since this X will 


Dy =? 


obviously fall in the lower half of this ND, the value of the corresponding 
2 will be negative. We, therefore, look into Column 7 of Table П for 20 
Der cent. The exact value 20 per cent is not to be found in this table, but 
We shall be satisfied to use the value nearest to 20 per cent, namely, 20.05 


Per cent, This value corresponds to 2= — -84. Now applying formula 
(8:5) we heve X= (5)(— 84) + 25 = 20.8. Hence, in this ND 
D2 = 20.8 


Exam ple 8.8. In the ND of the foregoing example, what is the value 


9f X, such that P(X = Xı)=.-1? . | 2. 
Solution yo the z-value corresponding to the required X, lies in 
the upper half of the ND and, hence, is positive. The z-value required is 
1e same as that which has PR = 90 per cent (see Figure 8.14). The closest 


203 


THE NORMAL CURVE 


value to 90 per cent in Column 6 of Table II is 89.97 per cent, which cor- 
responds to z=+1.28. Hence, Xi = (5)(+ 1.28) + 25 = 314, and in 
this ND 

Р(Х = 314) = 01 


10 15 20 25 30 35 40 


Figure 8.14 Normal distribution with p= 25, 0 = à 


Tyre 5. Score distance to either side of u corresponding to a given central arca. 


Example 8.9. Given a ND universe with u= 50 and o = 10. What 
is the value of z; such that in this universe 


Р(| с| = гу) 2.95 


Solution. It will be recalled that the vertical bars designate that only 
the absolute value (not the sign) of the distance represented by «= X — А 
is to be considered. That is, we seek a score distance, гү, which measured 
in both a positive and negative direction from 50 will exceed .95 of the score 
distances from 50 (see Figure 8.15). One way to obtain this information 


20 30 40 50 60 70 80 


| х\ - Xy | 


Fictre 8.15 Normal distribution with p= 50, о = 10 


from Table IT would be simply to locate the nearest value to 4750 in 


à 9. hic vs is 
Column 2. This value happens to appear exactly in Column 2 and 0! 


204 THE NORMAL CURVP 


responds to a z- or c-distance of 1.96. Since c = 10, it follows that 
x; = 19.6. That is, 
P(| x |= 19.6) = .95 
Type 6. Score distance to either side of u corresponding to a given extreme 
area. 
Example 8.10. Given a ND universe with u = 100 and с = 20. What 
is the value of xı such that in this universe 
Pigi =a) = .02 
Solution. Here we seck a score distance, xı, which measured in either 


a positive or negative direction from 100 will be exeeeded by .02 of the score 
distances from 100 (see Figure 8.16). One way to obtain this information 


Figure 8.16 Normal distribution with u = 100, с = 20 


from Table II is to locate the nearest value to 4900 in Column 2. This 


value is 4901 and corresponds to a z- or c-distance of 2.33. Hence, 
Ж = (20)(2.33) = 46.6 and 
P(|x| = 46.6) = .02 


8.9 INTERPOLATION 


In solving the examples given in the foregoing section under types 
4, 5, and 6, we were content with approximations to the zvalues or g- 
distances sought, since the given percentages or probabilities were not 
always to be found in Table II. In each such case we used the nearest 
tabled value, A similar situation would arise in determining percentages 
alues given to three or more decimal 
the nearest tabled value are quite 
it may be desirable to improve on 


9r probabilities corresponding to Z-V: 
Places, Ordinarily approximations to 
Satisfactory. Occasionally, however, 
this degree of accuracy. 


205 


THE NORMAL CURVE 


One way in which such improvement can be achieved is by making а 
two-point linear interpolation. This simply amounts to locating a value 
between two successive tabled values which occupies the same position 
relative to them that the given argument (i.c., the value with which the 
table is entered) occupies relative to the limits of the interval in which it 
falls. The process is much the same as that employed in estimating per- 
centile ranks (see Example 4.1) and percentiles (see Example 4.4) of a 
grouped frequency distribution. It can best be presented in terms of 
specific examples. 


Example 8.11. In a normal distribution what is the percentile rank 
of a z-score of 4- 1.277? А 

Solution. Here the given value of z with which we enter Table II 15 
reported to three decimal places and consequently is not to be found in this 
table. However, the given value of z is clearly 0.7 of the way up the tabular 
interval in which it falls, for: 


Upper limit, z2 = 1.28 
Given value, 2, = Em D | Dis 
Lower limit 21 = 1.27 xi 
Here Dıg =2, — zı = .007 
and Dis гэ — zi = .01 
iM Dis/ Dis = .007/.01 = 0.7 


ll 


Hence, since the given value of z is 0.7 of the way up the tabular interval 
in which it falls, it follows that the required percentile rank (Column 6, 
Table II) must be 0.7 of the way between the percentile ranks of zi and 22. 
The interpolation is completed as follows: 


PR(z2 = 1.28) = 89.07 

PR(z: = 1.27) = 89.80 

Difference = 0.17 and since 

0.7 X 0.17 = 0.119 

PR(z, = 1.277) = 89.80 + 0.119 = 89.919 = 89.92 


Example 8.12. Find the value of Qs in a unit normal distribution. 
Solution. Qs = г for which PR is 75. Here we seek 75.00 in Column 


of Table II. While this given PR is not in this table, we note that it is 0.49 
of the way up the tabular interval in which it falls, for: 


Upper limit, РЁ» = 75.17 


Given value, PR, = 75.00 Diz 
Lower limit, PR; = 74.86] 21° 


*Read Dig as “distance from z, to M 


206 


E 
THE NORMAL СОКУР 


Неге Dig= PR, — PR, = 0.14 
and Dis = PR2— PR, = 0.31 
Dig Di. 14.31 = 0.45 


Hence, since the given PR-value is 0.45 of the way up the tabular interval 
in which it falls, it follows that the required z-value (Column 1, Table IT) 
must be 0.45 of the way between the z-values that have the percentile ranks 
PR, and PR». Hence, we have 


Difference =0.01 and since 
0.45 X 0.01 = 0.0045 
z= Qs = Р = 0.67 + 0.0045 = 0.6745 

In general* the problem of two-point linear interpolation may be re- 
duced to a formula as follows: 

Let a, represent the given argument (the value with which the table 
is to be entered) and a; and a the limits of the tabular interval within 
Which a, falls. Let e, represent the consequent (the value to be obtained 
from the table) corresponding to ap, and cı and c» the consequents cor- 
responding to a; and az. Then following the same logical procedure as in 
the foregoing examples, we have the formula: 


suom TER (es — с). (8.10) 


To illustrate, we shall apply (8.10) to the data of the two foregoing 
examples. In the case of Example 8.11, we have 
a,= 1.277, a1 = 1.27, a2 = 1.38, 
cı = 89.80, and сә = 89.97 


1.2977 = 197 ks ay _ 
©. Cy = 89.80 + 98 1.97 (89.97 — 89.80) = 89.919 


And in the case of Example 8.12, we have 


ay = 75.00, a1 = 74.86, a2 = 75.17, 
cı = 0.67, and с» = 0.68 


_ 15.00 — 74.86 r g i sana 
^6, = 0.07 + 2:77 — 14,86 0-68 0.67) — 0.6745 


Or to find the approximate square root of 150.3 from Table I of Appendix 


3, we have 


*This procedure, for example, may also be applied to using a table of square roots to 
find the square teat of some number not included in the table. 


207 


THE NORMAL CURVE 


a, = 150.3, di 150, a» 151, 
сі = 12.247, and c» = 12.288 


1303 — 50 (12.288 — 12:247) = 12.259+ 


wey = 02.247 + 


8.10 FITTING a NORMAL Curve TO AN OBSERVED 
Frequency DISTRIBUTION 


Occasionally it is of interest to note the extent to which the distribution 
of some real collection of measures approximates the form of the normal 
curve. Partly because there are many bell-shaped curves which are not 
normal curves, and partly because normal curves may vary markedly in 
appearance (see Figure 8.4), it is impossible to tell by visual inspection 
of a polygon or histogram whether or not a given collection of scores has a 
distribution which approximates the normal curve. One method of соте 
paring a real distribution with a normal distribution involves superimposing 
on the histogram of the real distribution a true normal distribution which 
has the same mean, standard deviation, and total area as the real distribution. 
We shall illustrate the procedure involved in terms of two concrete examples. 

First, it will be recalled that the normal curves which we have pre- 
sented [formulas (8.1), (8.2), and (8.3)] have a total area of 1 and, hence, 
are appropriate only as models of relative frequency distributions. The 
simplest procedure, therefore, consists in superimposing a normal curve ol 
unit area on the histogram of the unit-interval relative frequency distribu- 
tion of the real data. Unit intervals are convenient, for if the height of à 
rectangle is made equal to the relative frequency of the interval involved, 
the area of this rectangle must also equal the relative frequency of this 
interval and, therefore, the total area under the histogram will be the sum 
of the relative frequencies Gie., 1). The procedural steps are as follows? 


Step 1. Compute the relative frequencies (Г/М) ociated with each unit 
interval (see. Column 3, Table 8.11) and construct the histogram 
(see Figure 8.17). 


‘le 2 PH . а Р; . . H 1 
Step 2. Obtain the mean and standard deviation of the given distribution. 
Step 3. Using the mean and standard deviation obtained in Step 2, convert 


cach interval midpoint (i.c., each unit point on the score scale) to 2 
z-score (see Column 4, Table 8.11). 
. Look up the heights (y's) of the unit normal curve (Column + 
Table II, Appendix С) which correspond to these z-scores (see 
Column 5, Table 8.11), 
Divide the heights obtained in Step 4 by the standard deviation 
obtained in Step 2 (see Column 6, Table 8.1 Lyk 
Step 6. Plot points at the heights obtained in Step 5 above the unit point 
on the X-scale, and use these ax guide points in sketching the 
required normal curve (see Figure 8.17). 


Step 2 


Step 5. 


208 


ayi 
THE NORMAL CURVE 


Y and rel. f 


.20 | 
5 
10 
.05 
If T X (Inches) 
45 50 55 60 
Victre 8.17 Normal curve with unit area superimposed on histogram of 
relative frequency distribution of Table 8.11 
TABLE 8.11 Distribution of Heights in Inches of 4,451 Canadian 
Boys Age 9 Together with Calculations Necessary 
for Superimposing а Normal Curve* 
Chassis (X) f vf == 235 yi 
50 15 008 3.11 0082 .001 
58 20 005 2.68 0110 005 
57 58 013 2.26 0310 013 
56 146 033 1.83 TAS 032 
55 265 1060 1.40 1497 O64 
54 462 104 0.98 .2468 105 
53 641 144 5 Б .146 
52 785 176 168 
51 705 158 ө 163 
50 585 131 — 07 3079 181 
49 385 087 ج‎ 1 3059 .088 
48 229 051 = 1.57 1163 049 
47 102 023 — 2.00 0540 023 
46 36 008 — 2.43 0208 009 
45 17 004 — 2.85 0069 003 
44 — 3.28 0018 001 
4,451 1.000 
“Based ht and Weight Survey of Toronto Elementary School Children, 
1939, ‘ ee { E dis i C Sel inde Dominion Bureau of Statistics, Social 
Analysis Branch, Otta : 


^ 
ap OT these 4,451 n 
+From Column 4, 


le ll, Appe ndix CB. 


209 


THE NORMAL CURVE 


It is important to note that the heights obtained in Step 5 do not, 
strictly speaking, represent relative frequencies, 
continuous trait can only be pictured graphically by segments of are 
These heights simply establish points on a normal curve, the total area 
under which is 1. 

It is clear from inspection of Figure 8.17 that the given distribution of 
real heights is quite closely fitted by the normal curve model. 

Occasionally there may be some preference for using a histogram of a 
distribution of frequency counts rather than of relative frequencies. If the 
heights of the rectangles of such a histogram are made to correspond to the 
interval frequency counts, then the total area under the histogram is V 


which in the case of any 
ж 


FicunE 8.18 Normal curve with area Ni superimposed on histogram of 
frequency distribution of Table 8.12 


Yorf 
1100 
1000 

900 
800 
700 
600 
500 
400 
300 
200 
100 


30 40 50 60 70 80 90 100 110 120 


* i : 2 

ee roughly approximate the normal-distribution relative frequen- 
ein is [d poa s. Consider, for example, the unit interval having the score 
(Нет pf is i - EN ae normal-distribution relative frequency for this interval is 
the two p satin a жое bounded by the segment of the eurve over the interval 
54.5 and £ To ic. erected at 54.5 and 55.5, and the segment of the Y-axis between 
the area of thi oni {= Ша this segment of the curve approximate: right line 
die He ae т a es igure is the product of the length of its base (5 1) 
е Es КУЙЫН j es the Y-value corresponding to X = 55. Since the b: 

Т ш. ж r тышы the urea of this portion of the normal curve. If these 
sher: cete rfe 1 Өе Approximations of the normal-distribution relative frequencies 

У, e the rf-values of Table 8.11, would total 1.000. Their solu] ои Sle 


(ds 


210 


THE NORMAL CURVE 


when unit intervals are used and Vi when intervals of size т are used. Of 
course, any curve superimposed on a histogram for purposes of comparison 
must have the same total area as the histogram. All the normal curves 
which we have thus far considered have unit area. It is a simple matter, 
however, to convert such normal curves to others having any specified 
area. All that is necessary is to multiply each Y-value by the magnitude 
of the area desired, that is, by Ni or, of course, by N if Т equals 1. Only 
two of the procedural steps previously listed need be modified.* Step 1, 
of course, now simply indicates the construction of the histogram of the 
ordinary frequency distribution. The rectangle heights should be made to 
represent frequency counts regardless of the size of interval used. The other 
step requiring modification is Step 5. Here in finding the Y-values from 
the y's, we must not only divide each y by & but we must also multiply the 


TABLE 8.12 Distribution of Weights in Pounds of 4,451 Cana- 
` dian Boys Age 9 Together with Calculations Neces- 
sary for Superimposing a Normal Curvet 


= 
CrassEs Miprrs. (X) f US 
115-119 117 3 
110-114 112 5 
105-109 107 5 
100-104 102 11 
95-99 97 21 чеч 
90-94 92 25 [9 
85-89 87 41 0 
80-84 82 103 118.5 
75-79 77 180 303.7 
72 468 .2492 593.1 
67 807 .3621 861.9 
es 1,084 3970 944.9 
Кт 079 3271 778.6 
Ji 553 .2012 478.9 
= 146 0940 228.7 
a 20 .0325 77.4 
- x .0086 20.5 
ш ET 0017 4.0 


TBased on same 4,451 boys involved in Table 8.11 with data taken from same source, 
For these 4,451 measures, X = 62.9 and &= 9.35. 

"rom Column 4, Table 11, Appendix C. 
COCOA bM CEU 
"Except, or course, for the additional fact th 
Of unit points, 


at we deal with interval midpoints instead 


211 


THE NORMAL CURVE 


resulting quotient by Nz. To illustrate, a histogram of the distribution of 
weights shown in Table 8.12 has been constructed (see Figure 8.18). ‘The 
total area under this histogram is Ni = 4,451 X 5= 22,255. A normal curve 
having this same area has been superimposed on this histogram (scc Figure 
8.18 and Table 8.12). It is interesting to note that this distribution of 
weights is skewed to the right and consequently is not nearly as well fitted 
by a normal curve as was the distribution of heights of the foregoing 
example. 


8.11 THE Lack or GENERALITY OF THE NORMAL CURVE 
AS A DISTRIBUTION MODEL 


We have presented the normal curve as a suitable model for a distribu- 
tion of chance or random errors and have indicated that as such it has 
proved to be highly satisfactory. Unfortunately, however, certain early 
statisticians formed the view that this curve could be used to deseribe 
almost any mass collection of data. Adolphe Quetelet* (1796-1874), for 
example, believed that data from anthropometry, economics, criminology, 
the physical sciences, botany, and zoology were all fundamentally "norma n 
in form of distribution. He was further convinced that the same was true 
of mental and moral traits and that verification of this point of view 
waited only the development of suitable measuring techniques. The 
identity of the individual who first applied the adjective "normal" to the 
particular curve we are considering is not definitely known, but the choice 
undoubtedly stemmed from a point of view like that of Quetelct. Both 
this adjective and point of view have tended to persist. 

Actually, if the student were to make a broad and representative collec- 
tion of frequency distributions of real data found in the research literature 
of education, psychology, sociology, anthropometry, and other related 
fields, and if he were to construct a histogram or even a smoothed polygon 
for each, he would find that his collection contained a wide variety of forms 
of distributions. Some curves would be skewed positively, others nega- 
tively, some would be bimodal, some U-shaped, some J-shaped, and some 
almost rectangular. It is true that many of them could be roughly deseribed 
as bell-shaped, but among these would be some too "peaked" and others 
too "flat-topped" to be represented by the "normal" curve model. The 
great variation in forms of distributions, even of a single trait, is strikingly 
illustrated by the age distributions presented in Figure 8.19. Because of 
this extreme variation in form, the student would find it impossible to 
phrase a single generalized description that would apply accurately to more 
than a small portion of the distributions he collected. There is, then, nO 


*See Helen M. Walker, op. cit. 


212 


THE NORMAL CURVE 


U. S. 
Filipino 
Population (1950) 


U. S. 
Total 
Population (1950) 


T T | 
0 10 20 30 40 50 60 70 80 90 O 10 20 30 40 50 60 70 80 90 
Age Age 


First Admissions of 
Patients with Psychoses 
to State Institutions (1952) 
тт T T TE La FD | Jg 
30 40 50 60 70 80 90 о 10 20 30 40 50 60 70 80 90 
Age 


0 10 20 
Age 


Aliens 
Naturalized 
in (1954) 


U.S. 
Widowers 
(1954) 


ЗЕ КЕ ЖЫЙ ЖЫР ШЫК Ө. 
O 10 20 30 40 50 60 70 80 90 о 10 20 30 40 50 60 70 80 90 
Age 


Age 


of various populations in the United 


Figvmg 8.19 Age distributions Un 
vie Mac die 2 Statistical Abstract of the United 


States based on data reported à 
States, 1955 


Universal "law" of any kind, not to mention an underlying “law of normal- 
form of frequency distributions in general. | 

There are two fundamental reasons why there nik acies ай uni- 
Versally applieable frequency-distribution model ~at nen ar dis uel 
Of measures of any human trait. In the first place, it i б i^ 1 iat the meas- 
Ures of a given trait may be distributed in different ways (forms) for differ- 


ent populations. This 15 illustrated by the age distributions pictured in 


Vigure 8.19. 


ity” Р 
Чу,” concerning the 


213 


THE NORMAL CURVE 


Of course, it is true that for certain populations the distributions: of 
certain traits appear to fit closely the normal-distribution model. The 
distribution of heights of nine-year-old Toronto boys, for example, was 
closely fitted by the normal-curve model (see Figure 8.17). Yet to make 
the general statement, "Heights are normally distributed,” is meaningless 
unless accompanied by specification of the particular population involved. 
It is, of course, meaningful to refer to the form of the distribution of 
heights for all nine-year-old Toronto boys, or for the U.S. Filipino popula- 
tion, or for all males in the United States; but, since the form of the distri- 


-20 - Y or rel. f 


05 


Т X (Inches) 


60 65 70 


75 80 


Figure 8.20 Normal curve superimposed on histogram of relative fre- 
quency distribution of heights of 18-year-old males (N = 85, X = 68.7, 
$= 2.23) 


bution would undoubtedly differ for each of these and other populations, 
and since no one of them сап be considered as the population, we cannot 
reasonably consider any single curve as representing the form of the distri- 
bution of heights in general, By way of further illustration, Figures 8.20 
and 8.21 show normal curves fitted to distributions of heights (in inches) 
of small groups of eighteen-year-old males and females respectively.* Both 
groups were of a common ancestral stock. It is clear that the normal curve 
model is as closely fitted by these distributions as c 
pected with groups no larger th 
8.22, which shows a normal с 
distribution formed by combir 


an reasonably be ex- 
an these. Now, however, consider Figure 
urve superimposed on the relative frequency 
ling into a single group, involving both sexes; 


"The data involved are 
Professor Howard V. 
European ance: 
mental Schools, 


unpublished measures collected between 1930 and 1945 DY 


Meredith of the State University of Iowa for children of Northwest 
ty who were enrolled as pupils in the State University of Towa Experi- 


214 


THE NORMAL CURVE 


the separate sex distributions pictured in Figures 8.20 and 8.21. It is cleat 
that the result of combining these two approximately normal distributions 
having different means (68.7 and 64.0 for males and females, respectively) 
results in a distribution whieh is too "flat-topped" to be well represented by 
the normal-curve model.* ‘Thus, while heights are approximately normally 
distributed for separate groups of eighteen-vear-old. males and females of 
Northwest European ancestry, they are not normally distributed for a 


25 [ Y or rel. Ё 


T 


5 


X (Inches) 
60 65 70 75 


55 


Fiure 8.21 Normal curve superimposed on histogram of relative fre- 
quency distribution of heights of 18-year-old females (N = 85, X = 64.0, 
$= 1.85) 


combined group of such males and females. From this me tac examples 
Previously cited, it is clear that we cannot talk meaningfully in general 
terms about the form of the distribution of a single trait. It should be 
obvious, then, that it would be even less fruitful to attempt to describe in 
general the distribution of any and all traits. — 

This is not to imply that the normal curve is useful only as a model of 
a distribution of chance errors. For highly homogeneous populations (e.g., 
individuals of the same age, sex, and ancestral stock), nd distributions of 
Certain physical traits are closely fitted by the Mies qum This 
Model may, for example, be used with satisfactory cina idm ше manu- 
facturers to estimate the need of, say, adult males for shoes of various sizes. 


*In fact if the means of two normal distributions differ sufficiently, the combined 


distribution will be bimodal. 


215 


THE NORMAL CURVE 


The normal-curve model has proved quite appropriate for desc Dg 2 
distribution for homogeneous populations of such physical iA » ; | 

amenable to fundamental measurement in terms of a linear : a noe 
height, waist, chest depth, chest width, foot length, ete.). A HR rr 
hand, distributions of measurements of other physical traits show вш a 
able deviation from the normal pattern, even for homogeneous grot (6 
The distribution of weights of the nine-year-old Toronto boys proved n ү, 
skewed positively (see Figure 8.18). This may be explained in part by t а 
fact that for individuals of the хате body type, weights are roughly propor 


Y or rel. f 


15 


.05 


| 
55 60 65 70 75 
Inches 


Ficure 8.22 Normal curve superimposed on histogram of relative fre- 


quency distribution of heights of 18-year-old males and females (N = 170, 
X = 66.4, 8= 3.11) 


tional to the cubes of their heights. If heights are symmetrically dis- 


i 2 А skewed posi- 
tributed, then the cubes of the heights must necessarily be skewed р 


tively, since large values are more affected by eubing than are small values 
(see Section 7.9 and Table 7.6), ^ 

The foregoing remarks suggest the second fundamental reason why 
there can be no single universally applicable fre 
Not only do distributions of 
but they diffe 


quency-distribution model. 
a given trait differ for different populations, 
r according to the particular scale employed in the ipt 
ment of the trait. Since the choice of scale is arbitrary, distributions oF 
different sets of measurements of the same trait for the same group of 
individuals may be made to differ in form in almost any way by simply 
varying the measuring scale employed. 

We have seen, for example, th 
markedly skewed positively (see T 
if they measure 


at distributions of incomes in dollars are 
"able 2.9). Keonomists have found that, 
incomes in terms of the logarithms of dollars, they obtam 
distributions which are quite closely approximated by the normal-curve 


216 


THE NORMAL CURVE 


model. We have also previously observed that, for homogeneous popula- 
tions, weight distributions were skewed positively because weights are 
roughly proportional to cubes of heights, which in turn are normally dis- 
tributed. If this proportionality were perfectly true, then weights would, 
like heights, be approximately normally distributed if we were to measure 
weights by means of a scale calibrated in terms of the cube roots of pounds. 
To the degree that this proportionality obtains, we would expect a distribu- 
tion of weights expressed as cube roots of pounds to be more nearly approxi- 
mated by the normal-curve model than a distribution of weights in pounds. 
To provide a concrete illustration, the weights of the 4,451 Toronto boys 
involved in Table 8.12 and Figure 8.18 were expressed in terms of cube 
roots of pounds. The histogram of the resulting distribution is shown in 
Figure 8.23. This histogram is clearly less skewed and more nearly ap- 
proximated by the normal-curve model than is that of Figure 8.18. 


1000 T 5 


x 
| T 1 
3.40 3.60 3.80 400 4.20 440 4.60 4.80 5.00 
j Cube Roots of Pounds 


Fiavmg 8.23 Normal curve superimposed on distribution of cube roots 
of weights of 4,451 Toronto boys involved in Figure 8.18 

Perhaps the most striking instances of the deliberate construction of 
Scales so as to produce normally distributed scores is to be found in educa- 
tional and psychological measurement. This can be Suet pint either by 
adjusting the difficulty of the items so as to make the raw-score dist ribution 
Normal, or by making some transformation of the raw scores which tends to 
yield scores that are normally distributed.* If he so desired, a test author 


ion of incomes in dollars to incomes in logarithms of doll 
i f weights in pounds to weights in cube roots of pounds © 
DA i ihat sometimes have а normalizing effect. We will consider 
forman ed a following section of this chapter. 


жр А 
Phe economists! conver: 

апа our сопу 

examples of tr: 


‘nother such transformation in 


217 


THE NORMAL CURVE 


could as easily prepare sets of items which would yield distributions iiis 
positively or negatively by either choosing items which tend to be too diffi- 
cult or too easy for the group involved. | 

As we have repeatedly indicated, in most educational and psychological 
test scales, the amount of the trait involved that corresponds to a scale unit 
varies in an unknown way from one part of the scale to another. С 'onse- 
quently it is impossible to use distributions of such scores as a basis for 
inferring the character of the distribution of the "true" amounts of the 
trait possessed by the members of a given group—assuming that somehow 
measurements of these "true" amounts could be determined. In seeking 
to construct scales which produce normally distributed scores, educators 
and psychologists are implicitly assuming the "true" amounts to be nor 
mally distributed for the groups involved and are simply making their 
scales conform to this purely a priori assumption. They are somewhat 
abetted in this purpose by the fact that the scores yiclded by their tests 
usually involve rather large ehanec-error components which, of course, tend 
to be normally distributed. The random addition of normally distributed 
components to any set of non-normally distributed scores ean only result 
in a set of scores which is more nearly normally distributed than before. 
Thus, the very inaccuracy of the scores yielded by educational and psycho- 
logical tests is itself a factor contributing to the tendency of such scores to 
be normally distributed. А 

It should be specifically noted that in the foregoing remarks there 18 
nothing critical intended regarding the practice of deliberately constructing 
educational and psychological scales so as to yield normal-score distribu- 
tions. Tf there is some logical basis for the a priori assumption that “true 
amounts of some trait are normally distributed for a given population, then 
it would not appear unreasonable that a measuring seale yield a distribution 
of scores which conforms to this hypothesis. Students particularly inter- 
ested in educational and psychological measurement may, however, wish 
to follow up the view advanced by some measurement authorities that the 
most accurate rank-order scales are those which yield more nearly reetan- 
gular score distributions. 


n . 5 H (o 1 2 

With few exceptions, our primary concern with the normal curve w phe 
as a model of a distribution of chance errors. This is the situation for which 
the formula of this curve Was orig! 


inally derived and its importance as 2 
model in this situation cannot be 


overemphasized. Its applications as such 
a model will be treated at length in subsequent chapters. The student 
should guard carefully against. any tendency toward the general application 
of this model to all types of data or ag; 
of normality, particularly with re 
mental or physical traits, 
volved is not highly hom 
related to those studied. 


ainst making too many assumptions 
ference to distributions of measures of 
and, most especially, when the population pe 
ogeneous with reference to other characteristics 


218 


THE NORMAL CURVE 


8.12. THe AREA TRANSFORMATION (7'-Scores) 


We have considered at some length the problem of attaching meaning 
to measurements arising from rank-order scales. The two basic approaches 
we have pursued involved the use of ranks (i.e., percentile ranks) and the 
use of standard scores (г and Z). In comparing percentile ranks and stand- 
ürd scores (see Section 7.9), we pointed out that the most important ad- 
Vantage of the PR over the z- or Z-score is its independence of the particular 
measuring scale and of the form of the score distribution resulting from it. 
This advantage, of course, actually holds regardless of whether we are 
dealing with rank-order or fundamental scales. Now, if scales were con- 
structed so as to yield distributions of some "standard" form for the popu- 
lations involved, then standard scores could always be interpreted with 
reference to this "standard" form, and the standard-score scheme would be 
freed of one of its major disadvantages. In this section we shall consider a 
scheme for making the scale values derived for a given group or population 
fall into some desired distribution pattern. The "standard" form or pattern 
usually used is the normal distribution. We have previously indicated that 
normal scale score distributions—or for that matter, any other form of 
distribution—may be achieved either by manipulating item difficulties or 
by transformation to another scale. The latter procedure provides the 
simplest technique for manipulating the form of such distributions. 

We actually have in z= (X — X)/8 and in Zio = 102 + 50 examples of 
linear transformations (see Sections 7.4 and 7.6). Such transformations, 
of course, do not alter the form of the score distribution (see Section 7.7). 
In the foregoing section, however, we illustrated a non-linear transforma- 
tion—using a weight score equal to the cube root of weight in pounds— 
Which yielded scores more nearly normally distributed than the original 
values. In this section we shall illustrate another type of transformation, 
known as an area transformation, which may be employed to manipulate 
the form of a distribution in any desired way. This scheme is in one sense 
& combination of the percentile-rank and standard-score approaches, 
Though it is usually used as a normalizing transformation, it is а general 
scheme and may be employed to produce a score distribution of any 
desired form. Consequently, we shall present it in general terms. uu 

Suppose that we have given a purely AMEN wen distribution 

laving some standard form.* Now suppose eir that us some real 
&roup of individuals or objects we have a collection dpi ш ii ex -values). 
The distribution of these X-scores may be of any form w Bates е, М е seek 
a scheme for transforming these real X-scores into, say, W-scores in such‘a 
Way that the distribution of transformed scores will have the same form as 
the theoretical distribution—that is, the standard form. We shall define a 


Se ne 


*Later we shall also incorporate standard values for the mean and standard deviation. 


219 


THE NORMAL CURVE 


point on the X-scale as equivalent to a point on the W-scale if the percentile 
rank of the X-point with reference to the real X-distribution is the same as 
the percentile rank of the W-point with reference to the theoretical distri- 
bution having the standard form. By this rule or definition of equivalence 
the real X-values are transformed to a new scale (W-scale) in such a way 
that the area (representing relative frequency) under the histogram to the 
left of (i.e., below) this new value for the real distribution is the same as 
that below this particular value in the theoretical distribution.* It follows, 
then, that the form of the real distribution of new W-values must be the 
same as that of the theoretical distribution. 


rel-f (6) 

40 

30 

20 

10 

х 

12 89 4 5 6 7 8 
rel-f (95) 
20 
10 

D 1 w 


3 4 5 6 7 8 9 10 n 12 13 14 15 16 


Ficure 8.24 Histogram of relative frequency distribution of an assumed 


real collection of X-values and graph of theoretical rectangular relative 
frequency distribution of W's 


The student can best gain an appreciation of how this area transforma- 
tion scheme functions by consideration of specific examples. Although the 
scheme is usually used with normal theoretical distributions, we shall find 
it simpler in our first illustration to consider a theoretical distribution that 
is rectangular. We shall arbitrarily use the p 


) articular rectangular distribu- 
tion which ranges from 5 to 15 in se 


à 1 ale values. As our assumed (this ex- 
ample is hypothetical) "real" distribution we shall use a bell-shaped 
symmetrical distribution. "These two distributions —'"'real" and theoretical 
—are presented graphically in Figure 8.24. 


* is 


: for this reason that transformations of this type are know 
tions. i 


n as area transforma- 


220 


THE NORMAL CURVE 


Now following the procedure described in Section 4.9, we shall con- 
struct the relative cumulative frequency ogives of these two distributions 
on the same axes. The horizontal axis must, of course, include values of 
both X- and W-scales. These ogives are shown in Figure 8.25. Now to 


rcf (PR) 
100 


X-Ogive W-Ogive 
90 
80 
70 
60 
50 
40 
30 
20 
10 


i 
0 12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
X- and W-Score Scales 


Figure 8.25. Relative cumulative frequency ogives corresponding 
to distributions pictured in Figure 8.24 


effect the desired transformation, we use these ogives to read, for any given 
X-value, the W-value which has the same percentile rank. For example, 


the broken line on the figure shows that HW =7 has the same PR (20) as 


MVEN CORRESPONDING 
T SCORE W-Score 
TABLE 8 1 3 X-8cont 
` 2.5 5.0 
T ng X-Fi 3.0 55 
Table for Transforming X-Values "e е 
to W-Valucs in the Case of the ҮП x: 
Distributions of Figure 8.24 15 E 
5.0 10.5 
5.5 12.0 
6.0 13.0 
6.5 14.0 
7.0 14.5 
7.5 15.0 


221 


THE NORMAL CURVE 


X = 4 and, hence, any individual scoring 4 on the X-scale is assigned a score 
of 7 on the W-seale. The W-equivalents of the unit points and of the real 
limits of the unit intervals on the X-scale are shown in Table 8.13. 

Table 8.14 presents the relative frequeney. distribution of the col- 
lection of real scores both in terms of the original X-seale and the new 


TABLE 8.14 Relative Frequency Distribution of Assumed Real 
Collection of X-Values Pictured in Figure 8.24 and 
Also of Their Corresponding W-Values 


Rear Limits Р Hrs Rear Limits Hrs 
(X-BcaLk) 7 (rf i) (W-Scatr) у (rf i) 

10 10 14.0-15.0 10 10 

20 20 12.0-14.0 20 10 

40 40 8.0-12.0 40 10 

20 20 6.0-8.0 20 10 

_10 10 5.0-6.0 10 10 

100 100 


W-seale. The columns headed “Hts” give the heights of the rectangles 
comprising the histograms of these distributions. These heights were de- 
termined by dividing the area of each rectangle (i.e., the interval relative 
frequency, rf) by its base (i.e., the width of the interval, 2). Since the inter- 


Hts. 
50 
40 
30 
20 
10 
NI X-Scale 
01 2а Мм “м ызык. 
w Ñ N TS "чы. 
N \ ES Me ON. 
Hts, X \ М. Me Ж ы 
10 
ы ^ 5 6 7 8 9 10 1 12 13 14 15 16 
W-Scale 


Fictre 8.26 


Histogran Е PEU 
Table 8.14 grams of relative frequency distributions of 


222 


THE NORMAL CURVE 


vals on the X-scale are all unity, the heights are the same as the interval 
relative frequencies. The HW-seale intervals, on the other hand, vary in 
size. It will be observed that the variation occurs in such a way that, when 
the areas of the rectangles are made to correspond to the interval relative 
frequencies, all the rectangles are the same height. Thus the symmetrical 
unimodal distribution of X-values is transformed into a rectangular dis- 
tribution of W-values.. The histograms of these two distributions are pre- 
sented in Figure 8.26. How this transformation, in effect, stretches some 
parts of the scale and contracts others so as to produce a distribution of the 
desired form is illustrated in this figure by the broken lines connecting cor- 
responding points on the two scales. 

In our second example we shall use a theoretical W-distribution that is 
normal in form. This, as we have indicated, is the “standard” form most 
commonly used in the construction of educational and psychological scales. 
For our example, however, we shall use the fundamental measurement data 
on weights given in Table 8.12. Specifically, we shall transform this posi- 
tively skewed distribution of weights to a theoretical normal distribution 
of W-values. We shall also specify in addition that the mean and standard 
deviation of this theoretical distribution have the standard values 50 and 
10 respectively. Since the scores arising as a result of this particular trans- 
formation are usually designated by T, we shall use this symbol instead of 
W to designate the transformed values, and we shall, hereinafter, refer to 
this particular transformation as the 7'-transformation and to the resulting 
Scores as 7-scores. The normal-distribution relative cumulative frequencies 
(percentile ranks) associated with selected points on the 7-seale are given 
in Table 8.15. These values were read directly from Columns 6 and 7 of 
Table II, Appendix C. Table 8.16 gives the relative cumulative frequencies 


TABLE 8.15 


PR's Associated with Se- 
lected Score Points (T's) 
^f a ND Having w= 50 
and o = 10 


223 


Tip > 
* NORMAL CURVE 


U E R EAL PR 

AMIT 

TABLE S. TES 119.5 100.0 
114.5 09:7 

Relative Cumulative Frequencies 109.5 99.8 

(PR's) Associated With Upper 104.5 99.7 

Real Limits of Intervals of Dis- 99.5 99.5 

tribution of Weights Given in 94.5 99.0 

Table 8.12 805 984 
84.5 97.5 
79.5 95.2 
74.5 91.2 
69.5 80.7 
64.5 62.6 
59.5 38.2 
54. 16.2 
49. 3.8 
44. 0.5 
39:5 0.0 

ل 

100 

90r 

80r X-Ogive. 

70 


a 
o 


ES 
о 


rcf (РЕ) -Scale 
a 
© 
T—T—1— 


I^] 
o 
a 


N 
o 
T 


'TTTT 


TTT FI TTT PE Tip X-Scele 
30 40 50 "E: | 


64,5 70 80 90) 100 110 120 


T-Scale 20 25 30 35 40 45 50 [5 60 65 70 75 80 
53.2 


Figure 8.27 Ogives of relative cumula 
tions of Tables 8.15 and 8.16 


tive frequency distribu- 


224 


THE NORMAL CURVE 


(PR's) associated with the upper real limits of the real weight distribution 
presented in Table 8.12. The two ogives are shown in Figure 8.27. While 
the same ref-seale may be used for both ogives, different score scales (one 
for the N-scores and one for the Z-scores) аге required. In Figure 8.27 
these two score scales have been permitted to overlap in order to make the 
figure more compact. If desired, completely different portions of the hori- 
zontal axis could be given over to these two score seales though it is usu- 
ally more convenient to allow them to overlap to some extent as in Fig- 
ure 8.27. There is no rule that need be followed regarding the placement of 
these score scales on the horizontal axis except that they should be so 
placed that the two ogives are distinctly separated. These ogives may now 
be used to determine 7'-values having the same percentile ranks as given 
Y-values. The broken line of Figure 8.27 shows, for example, that a 
T-value of 53.2 has the same PR as an X-value of 64.5. In this manner a 
conversion or transformation table can be construeted which gives the 
T-value corresponding to any given X-value. Actually, it is not necessary 


TABLE 8.17 Relative Frequency Distributions of a Real Collec- 
t Lion* of Weights in Pounds and in T-Scores 
Rear Linares Hrs ReaL Ілмітѕ у Hrs 
(Pounps) у (Л) (T-Vauurst) (rf, i) 
114.5-110.5 0.1 0.02 81.0- — 0.1 === 
109.5 114,5 0.1 0.02 0.1 0.05 
104.5 109.5 0.1 0.02 0.1 0,08 
99.5 5 0.2 0.04 0.2 0.12 
L5 0.5 0.10 0.5 0.20 
5 0.6 0.12 0.6 0.33 
5 0.9 0.18 0.9 0.47 
5 23 0.46 23 0.79 
Б 4.0 0.80 4.0 1.25 
5 10.5 2.10 10.5 2.19 
3 18.1 3.62 18.1 3.20 
à 244 4.88 244 3.04 
5 22.0 4.40 22.0 3.19 
9.5-54.5 124 248 124 1.50 
44.5-49.5 3.3 0.66 3.3 0.41 
39.5-44.5 0.5 0.10 uu — 
100.0 ^ 
x in Table 8.12. 
Dg hes in gue 27 the studeny, wil not, be ae to voi 
чер н Sag are rej rted. ^ sho , however, satisfy 
lese T-values to the degree А dee ph of Figure 827, and that Ge 


Himself that they may be roug 


May be exactly checked by following the three procedural steps cited on p. 226, 


225 


THE NORMAL CURVE 


to изе ogives to determine T-values corresponding to any X-values. Specif- 
ically, the T, corresponding to Ху may be found ах follows: 


(1) Find the PR of XN; in the real CX) distribution. 

(2) Using either Column 6 or 7 of Table II of Appendix С, find the 
value of гү (i.e, the unit normal deviate) that has this same Tn. 

(3) Then, Tı = 102; + 50. 


Our reason for incorporating the use of ogives in this example arises 
from the fact that oftentimes the resulting conversion tables are applied 
to the determination of T-xeores for individuals whose N-scores were not 
included in the original real X-distribution. In other words, the applica- 
bility of the conversion table is often generalized or extended to a popula- 


tion of individuals which, except for chance irregularities, is assumed to 
have been adequately represented by the real collection at hand. Аз we 


30 40 50 60 70 80 90 100 110 120 
X-Scale (Weight in Pounds) 


Hts. 


20 25 30 35 40 45 50 55 60 65 70 75 80 
T-Scale (Weight in T-Scores) 


Fieure 8.28 [Tistograms of relative frequency distributions of Table 8.17 


*ү jig э i 
X, is used to refer to a particular value on the X-seale. 


226 


pve 
THE NORMAL, СОНУ” 


have previously scen, the effect of such chance irregularities may be some- 
What diminished by the free-hand smoothing of the ogive of the real 
Y-distribution (see Section 4.10). It is because such smoothing is most 
easily accomplished with ogives that we have made use of them as a device 
for establishing the conversion table. 

Table 8.17 shows the 7-values corresponding to the real limits of the 
intervals on the N-scale. This table also gives the relative frequency for 
each interval, as well as the heights the histogram rectangles must be made 
so that their areas represent (equal) the relative frequencies of the corre- 
sponding intervals. The two histograms are presented in Figure 8.28. For 
purposes of comparison, a normal curve (“= 50, с = 10, and area = 100) 
has been superimposed on the histogram of the T-score distribution. 

It should be noted that in the 7-scale we have a type of standard-score 
scale which, like the z- and Z-scales, indicates the location of a score in 
terms of a standard mean (50) and standard deviation (10). Unlike the 
z- and Z-scales, however, the form of the distribution of 7-values will be 
approximately normal,* regardless of the form of the original X-distribution. 
Hence, any given T-value always has the same PR, a fact which greatly 
simplifies problems of interpretation. It should be noted that, while the 
T-scale is perhaps most often used, constructors of educational and psy- 
chological scales have not infrequently employed, as theoretical distribu- 
tions, normal distributions with means and standard deviations differing 
from 50 and 10. 

Since a given Z-value always has the same percentile rank, it is pos- 
sible to set up a table giving the T-value corresponding to any percentile 
rank. Table IIL of Appendix C is such a table. Table IV of Appendix C 
gives the percentile rank corresponding to a given 7-value. 


8.13 ASSIGNING LETTER GRADES 


Because so much misunderstanding exists regarding a practice known 
as "grading on the curve," we shall conclude this chapter with a few re- 
marks on the transformation of rank-order, numerical scale values into 
letter grades. Though these remarks may be extended to any system of 
letter grades, we shall consider only the most widely used system which 
involves classifying the individuals into five levels of achievement or qual- 
ity. While these levels are usually designated A, B, C, D, and F from high- 
est to lowest quality level, respectively, other symbols are not infrequently 


Chosen, : 
The scheme which is usually referred to by the phrase "grading on the 


Curve" may actually be thought of as involving an area transformation. 


ictly speaking, only an infinite collection of the true amounts 


*\ з . 
Approximately, since, str h 
 bproximately, sin an be normally distributed. 


OF a continuous attribute c 


227 


THE NORMAL CURVE 


Actually, the final result of applying this transformation is a simple one 
which could be presented more easily were no reference made to the theory 
of the area transformation. We have, nevertheless, elected to follow the 
area transformation approach primarily to provide the student with further 
experience with this type of transformation. In applying this transforma- 
tion, we shall use as our theoretical (JV) distribution one which we shall 
derive from the normal distribution. Arbitrarily considering the range of a 
normal distribution to be 6 с’ (see Section 8.3), we shall divide this range 
into five equal segments or intervals, each spanning 1.2 o's (i.e., 6/5 o's). 
We can, without loss of generality, think in terms of the unit normal dis- 
tribution of (8.3) Then, the scale values involved are in terms of z-units 
and the limits of the five intervals, beginning with the highest, are: + 1.8 
to + 3.0; + 0.6 to + 1.8; — 0.6 to +0.6; — 1.8 to — 0.6; and — 3.0 to 
— 1.8. We may, of course, if we prefer, view the highest and lowest inter- 
vals as open-ended, that is, as having the limits + 1.8 to +% and — « to 
— 1.8, respectively. Table 8.18 shows the PR’s of the limits of these in- 


TABLE 8.18 The Theoretical Normal Distribution Used in the 
Area Transformation to a Letter Scale 
LETTER INTERVAL PR-VALUE 
DESIGNATION Limits (г) or Limits 
A +18-+ = 96.4—100 
B + 0.6- + 1.8 72.6-96.4 
C — 0.6- + 0.6 27.4-72.6 
D — 1.8- — 0.6 3.6-27.4 
F —€9-—]1.8 0-3.6 
100.0 


tervals, as well as the percentage of cases in each. The PR-values were read 
from either Column 6 or 7 of Table II, Appendix С. The percentages (rela- 
tive frequencies) were obtained by finding the difference between the 
Pk-values corresponding to the upper and lower limits of the intervals. 
Now to form the theoretical (W) distribution of the area transformation, 
we shall associate the relative frequencies thus derived from the normal 
distribution with the five-point letter scale. The histogram of this W- 
distribution (the letters indicate the interval midpoints) is shown in Fig 
ure 8.29. To set up a conversion table for the 
sufficient to note first that the PR of the 
is 3.6 and, hence, any F has a PR of 3.6 or less. Consequently, any A 
having a PR of 3.6 or less has a PR equal to that of some F. Since there 
is no differentiation among F's in this system, all such X’s transform into 
F's. The result is simple, the lowest 3.6 per cent of the scores in the 
X-distribution being transformed into F's. Second, the PR-values of the 


area transformation, it is 
upper real limit of the F interval 


228 


THE NORMAL CURVP 


A 
o 


rcf (PR)-Scale 
© 
© 


Е р С B A 
Letter and z-Scale 


FiaunE 8.29 Histogram of theoretical (W) distribution used in 
area transformation 


lower and upper real limits of the D interval are 3.6 and 27.4, respectively. 
Hence, any D has a PR between 3.6 and 27.4. Consequently, any X having 
a PR between 3.6 and 27.4 has a PR equal to that of some D and, since 
there is no differentiation among D’s, all such X’s transform into D’s. 
Again the result is simple, the 23.8 per cent (27.4 — 3.6) of the X's lying 
Just above the lowest 3.6 per cent transform into D's. By similar argu- 
ments, we arrive at the simple conversion table shown in Table 8.19. 


TABLE 8.19 


7 Р А A 
Table for Converting Numerical S 

Scores (X's) into Letter Grades 4 

gu а C next 45,2% 

D next 23.8% 

В lowest 3.6% 


Obviously, all that is actually necessary to effect this transformation 
is to rank the individuals in order of their X-scores and then to assign 
letter grades as specified in Table 8.19. 

The justification for this procedure rests on the assumption that the 
true amounts of achievement, or of whatever trait may be involved, are 
normally distributed for the group at hand. It is, of eourse, possible for 
this assumption to be satisfied, regardless of the form of the distribution 
yielded by the particular measuring scale employed. In fact, one of the 
major advantages of this technique is its independence of the particular 


THE NORMAL CURVE 2 2 9 


measuring scale employed, which enters into the scheme only to the extent 
of establishing the ranks of the individuals involved. | 

Now this assumption of normally distributed (rue trait amounts may 
not be unreasonable if the group involved is large and more or less ran- 
domly selected. On the other hand, if the group is small and highly selec- 
tive, as may be true of students electing some advanced school subject, ü 
is likely that no single member should be classified in the I or perhaps 
even the D—category in terms of any reasonable standard. Yet, il this 
transformation is rigidly applied, 3.6 per cent of the membership of such 
groups will be assigned F's—a designation usually implying failure. ue 
viously, then, this scheme cannot be appropriately applied except in situa- 
tions in which it is reasonable to assume that the true amounts of the trait 
involved are normally distributed for the group at hand. І i 

Quite often percentages other than those given in Table 8.19 sits Ti 
in applying this scheme. One set sometimes advocated consists of 7 pet 
cent A's, 21 per cent, B's, 44 per cent C's, 21 per cent D's, and 7 per cent 
F's. Use of different sets of percentages amounts to nothing more than the 
use of different theoretical (W) distributions in effecting the arca transfor- 
mation. The choice of such theoretical distributions is always arbitrary, 
so that any set of percentages deemed suitable for the group involved muy 
be employed. It is not even necessary that the theoretical distribution be 
symmetrical. It should always be understood, however, that the theoretical 
distribution selected is assumed to represent that of the truc trait amounts 
for the group at hand. m 

An alternative scheme sometimes advocated may be viewed as involving 
a linear transformation of the obtained X-scores. The transformation used 
is simply the z-transformation [i.e., z = (X — X)/&]. Then letter grades are 
assigned according to the same interval limits (z-units) as are shown 1 
Table 8.18. It follows, therefore, that if the distribution of original scores 
(X’s) is normal, this transformation results in precise 
ment of letters as does the area-transformation scheme that gave rise to 
Table 8.19. On the other hand, if the distribution of X’s is not normal, then 
the results of the two schemes differ, 

Consider, for example, a smoothed polygon representing a hypothetical 
distribution of X’s which is negatively skewed (sce Figure 8.30—shaded 
area). In such a distribution it may well be that no score is 1.8 standard 
deviations above the mean. If this is the case, then no z will be + 1.8 0T 


higher and, consequently, no A’s will be assigned. Now if it happens that 
the form of the raw-score distribution is the same as that of the true trait 
amount—that is to say, if there are no individuals in the group at hand 
who actually belong to the A category (see dotted portion of polygon 0 
Figure 8.30)—then the fact that no A's are assigned is precisely as it shouk 
be. It may be, however, that the negative skew of the X-distribution 18 
not due to the absence of an A achievement group, but rather to peculiar 


ly the same assign- 


230 


THE NORMAL CURVP 


lies of the particular measuring scale employed.* In this situation the 
failure of the scheme to assign A’s may represent gross error. 

The appropriate application of this scheme rests basically on two 
assumptions. First, to justify the particular interval limits chosen, we must 
assume that the group at hand is a subgroup or sample from a large un- 
selected group or population which is normally distributed with respect, 
to the particular X-scale employed. Secondly, we must assume that the 
X-scores for the group at hand have approximately the same mean and 
standard deviation as would the large unselected group, so that the z-scores 
specifically obtained provide the proper basis for classification with refer- 


Ficvng 8.30 Smoothed 
polygon of hypothetical 
distribution of original 
Scores (X's) 


ence to the large group. It is apparent that in situations where selection 
operates these two assumptions are inconsistent. The successful operation 
of this technique would be greatly enhanced if the large group mean and 
standard deviation were known or could be estimated independently from 
the given set of X-scores. | 

In conclusion, two further points should be made with respect to this 
latter scheme. First, the use of the interval limits of Table 8.18 represents 
an arbitrary choice. Just as the percentages of Table 8.19 may be arbi- 
trarily varied, so may the interval limits used in this latter scheme. Second, 
there. is no need, in applying this scheme, to convert all the X-scores into 
scores in order to effect the classification. Instead, we may convert the 
zlimits to X-units. The classification may then be effected by referring 
sores to interval limits expressed in X-units. Since the 


the original X 
z-value for any X is given by 


it follows that the X-value for any 2 is given by: 
X= + (8.11) 


*This would happen, for example, in the case of an achievement test which had in- 
: 4 e Such a t does not provide the superior student an 


advertently been made too EU Les AU eL RES 
Opportunity to demonstrate his superiority age students may make 


nearly perfect scores—and results in a negatively skewed score distribution, 


ince even s 


THE NORMAL CURVE 2 3 1 


For example, if X = 25.7 and $= 5.3, application of (8.11) converts the 
z-limits into X-limits as shown in Table 8.20. The non-overlapping X- 
limits shown in this table were obtained by rounding the upper limits down 
and the lower limits up to the next unit point. 


TABLE 8.20 Letter Interval Limits in z- and X-Units (X = 25.7 
and $ = 5.3) 
Limits Limits NON-OVERLAPPING 
z-Units NX-Units X-Limits 
8—95 


+ 0.6 - + 1.8 
— 0.6 — + 0.6 
— 1.8- — 0.6 

—-— 1:8 ~<— 16.16 


232 


THE NORMAL CURVP 


INTRODUCTION TO 
SAMPLING THEORY 


9.1 Tue GENERAL Nature or SAMPLING STUDIES 


A large majority of the research studies in education and psychology, 
or for that matter, in many other fields, are of a type known as sampling 
studies. In such studies measurements or observations are made of a limited 
number or sample of individuals or objects in order that generalizations or 
inferences may be drawn about still larger groups or populations of the 
individuals or objects that these samples are supposed to represent. Be- 
Cause the individuals or objects comprising these populations differ from 
one another, and because chance or uncontrolled influences always play 
some part in determining which of these differing individuals are to consti- 
tute the sample used, any single fact obtained from the examination of the 
sample is almost certain to differ by some amount from the corresponding 
fact for the whole population. Such “sample facts,” therefore, may never 
be accepted as exactly descriptive of, or equivalent to, the corresponding 
facts for the whole population. 

Consider, for example, the type of sampling study which is perhaps 
most widely known to the general publie, the publie opinion survey. In 
the fall of 1948, the American Institute of Public Opinion, popularly known 
as the Gallup Poll, reported in the press its prediction of the outcome of 
the presidential election of that year. This prediction was based on a sample 


INTRODUCTION TO SAMPLING THEORY 2 3 3 


presumed to represent the population of individuals who would cast their 
ballots for president on November 2. 1918. The individuals constituting 
this sample were asked, in advance of November 2, for whom they would 
vote if the election were assumed to be in progre 
place on the day the question was asked. The percentages of individuals 
in the sample indicating they would vote for Dewey, Truman, Thurmond, 
Wallace, or some other candidate were reported as predictive of the elect ion 
outcome. Whether or not such sample percentages are good estimates of 
the population percentages depends upon the degree to which the sample 
is representative of the population involved. 

Table 9.1 shows the sample percentages as they were published by the 
American Institute of Publie Opinion and corresponding population per- 
centages as reported in Statisties of the Presidential and Congressional Elec- 
tion of November 2, 1948 (Government Printing Office, 1919). It will be 
observed that the discrepancies between the sample and population per- 


s that is, to be taking 


TABLE ©] 


Sample and Population 
Percentages of Votes for 


CANDIDA! 


SAMPLE 
PERCENTAGES 
(Gallup Poll) 


POPULATION 
PERCENTAGES 
(National Vote) 


Dewey 45.1% 
1948 Presidential Candi- Truman 49.5 
dates Thurmond 24 
Wallace 2.4 
Other — 0.6 


100.09 100.006 


7 
70 


centages are sufficiently great to invalidate the election forecasts based 0n 
these sample pereentages. We shall not attempt, at this point to explain 
the failure of this sample to correspond more closely to the population 
involved.* The example is intended simply to show (a) how a sample may 
be used to infer facts about a population, and (b) the risk of error which is 
encountered in making such inferences. It should be obvious that this risk 


is sufficiently serious to demand a careful study of what may reasonably 
be anticipated by way of such errors. 


2 FIN (5.495 ы " 
9.2 DEFINITIONS AND BASIC Concerts or Samprina-Error THEORY 
Population 


By population we mean the aggregate 


| ы i on ets 
/ latio or totality of objects or individual 
regarding which inferences are 


to be made in a sampling study. 


Ms dein ra dps Due later, samples from one population (the у 
a eu er de BO Clm used in drawing inferences about a different popul coat 

: ng public on ? ovember 2). TH assumes the identity of the two population’ 
as regards the characteristic under investigation. The failure of the poll ге ted here F 
probably due to the failure of this assumption on fhis-pasticular renti a 


prior 


oting public ation 


234 


в -oRY 
INTRODUCTION TO SAMPLING THEOR 


One of the important steps in the design of a sampling study is the 
specification of the population to be studied. It may be that this can be 
easily accomplished as in the case of a population of 7 X 15 four-ply tires 
produced by a certain manufaeturer where it is desired to estimate the 
mean mileage for the population; or in the case of a population of fourth- 
grade pupils enrolled in the Catholie parochial schools of a certain state 
Where it is desired to estimate the mean performance for the population 
оп а certain test of ability to spell. Quite often, however, the specification 
of the population presents difficulties. Consider, for example, a population 
of farms, where it is desired to estimate the mean annual income for the 
population. The difficulty, of course, has to do with the definition of a 
farm. Questionable cases will arise and the investigator will be in doubt 
about whether or not a particular object belongs to the population—that 
is, isa "farm." It is essential that the population be specified to a point that 
eliminates such doubt. The investigator is responsible for the development. 
of a set of rules which clearly determine whether or not a given object 
belongs to the population under investigation. These rules, then, prescribe 
this population. 


Sample 

By sample we mean a collection consisting of a part or a subset of the 
objects or individuals of a population which is selected for the express purpose 
of representing the population, that is, as a basis for making inferences about 
or estimates of certain population facts. 


The statement that a sample ought to be selected from the population 
it is intended to represent may seem a truism. The fact is, however, that 
the selection of a sample from the population involved may be impossible. 
In the example of the Gallup Poll cited in the foregoing section, the popula- 
tion consisted of individuals who voted in the presidential election of 
November 2, 1948. At the time Dr. Gallup’s organization selected the 
sample, this population did not yet exist. The sample was actually taken 
from one population and used to represent another. This was, in this case, 
done deliberately and with the hope that the two populations would be 
sufficiently alike that generalizations extended to the population actually 
Sampled could also be extended to the then as yet nonexistent population 
of actual voters. The erroneous forecast given by the poll on this particular 
Occasion was probably due to differences between these two populations. 
On another occasion the populations may be sufficiently alike to permit a 
generalization of this type to be accurate. 

Another more extreme example of the use of a sample taken from one 
Population as a basis for drawing inferences about another is to be found 


in medical experimentation conducted with animals. A sample of rats or 
guinea pigs provides a basis for generalizations regarding, say, the effect m 
some new drug upon weight for a population of such rats or pigs. Popula- 


INTRODUCTION TO SAMPLING THEORY 235 


tions of other animals—even human beings—may then oe 
sufficiently like this population of rats insofar as the effect of thc ion disi 
treatment is concerned to permit the second. generalization pes чы 
Often—at least at certain stages of theory development— this um is 
the only practicable means of experimentally checking theory. | ym 
is the case, it ultimately becomes essential for the investigator ма c ue: 
comparative information about the two populatious for the ra de 
determining whether generalizations may reasonably be extended fror 

er. | _ 
i p rarai situation is often encountered in educational omes 
tion, particularly in that having to do with the evaluation of dr apre 
effectiveness of two ways of doing something such as bere ми 
developing a particular skill in arithmetic at, say, the fourth-grac | l1 
The two methods may be "tried out" on samples of fourth-grade chik hs 
and one of them may prove better than the other insofar as the ва : 
of children involved are concerned. To recommend this method in pre p 
ence to the other implies a generalization of sample fact toa population M 
children who will be attending similar fourth grades in the future a E" 
lation which is, of course, nonexistent at the time of ie neqae a ees 
success of such a generalization depends upon the degree to whic Jos 
experiences and abilities of the members of this future population ine af 
and continue to conform to the experiences and abilities of the members 
the population studied. 


Sampling Unit 


We have indicated that the populations we seck to study consist of : 
number of individuals or objects. Each individual or object is а pum 
unit. For the purpose of selecting a sample the population is divided un йш pn 
number of parts called sampling units. These parts usually contain е i 
more population units. No population unit may belong to more than one st 
part and the aggregate of these parts is the whole population. "E 

In the simplest situation, the population unit is the sampling unit. «ted 
example, in the illustration of the population of tires previously sugges ike 
the sampling unit could be a single tire. On the other hand, in the ioo 
tion of fourth-grade pupils enrolled iu the Catholic parochial schools OF ‘ 
certain state, the sampling unit might be 
building, or even the children residing 
such asa township. It is cle 
unit any such given unit 
population units (children), 


1 se hoo: 
а classroom, or perhaps a SC sai 
1 ' 15 
in some governmental tmi 
x е пее 
ar that if the township were used as the sam} 


re 
d H r mor 
could conceivably contain none, one OF 


Score 


To begin with we are, of c 


. . M орч“ 
ourse, interested in determining some P р 
lation fact. Our interest m. 


- orally 
ay be no more clearly defined than a gene? 


236 


oR 
INTRODUCTION TO SAMPLING THI 


expressed desire to determine the life of a certain type of tire produced by 
a certain company. What do we mean by the life of a tire? Do we have in 
mind miles of wear before it becomes useless beyond repair? If so what 
kind of wear, on what kinds of roads, at what kinds of speeds, and bearing 
what kinds of loads? What does the phrase “useless beyond repair" mean? 
By what criteria may one judge this state in a tire? These are only indica- 
tive of the many questions which must be answered if our general purpose 
is to be satisfied. 

Somchow we must in the case of each sampling unit comprising our sample 
accurately count or measure the characteristic or trait in which we are funda- 
mentally interested. We shall call these counts or measurements scores. These 
scores constitute the basic data from which our generalizations will stem. 

How such measurements should be taken—how valid and reliable they 
should be—are topics which will not be treated in this book. It is important 
to note, however, that the most erudite statistical analysis can only help 
to interpret the information contained in the scores. It can never add 
information. If the scores are inappropriately or inaccurately determined 
the study is doomed to failure and the money, time, and energy which have 
been expended will have been wasted. 


Parameter 

Parameter is the name given to the population fact we seck to estimate. 
It is not the estimate we may obtain but the fact itself. It can be obtained 
only by determining the scores for all of the units that comprise the popu- 
lation. That is to say, а parameter is a population fact which depends upon 
or is a function of the scores for all the population units. 

For example, in our illustration regarding the life of a certain brand of 
automobile tire, the pertinent population fact might be taken as the mean 
number of miles of service the tires comprising the population would give 
before wearing out. In this population, the unit is a tire of a certain size 
Manufactured by a certain company. The score for a given unit is the 
number of miles-of service it will give before wearing out. This, as has pre- 
viously been suggested, requires further definition. Let us assume, for the 
sake of economy as well as of achieving uniformity of condition, that the 
tires are tested on a machine designed to simulate use on a car, and that 
the machine is calibrated to indicate for each tire tested the number of 
miles that correspond to the time on the machine. Let us further assume 
that a tire is defined, for the purpose of this investigation, to be worn out 
when it first blows out. Then the score for a given unit (tire) is obtained 
by placing the unit in the machine, setting the indicator dial to zero, letting 
the machine run until the tire blows out, and reading from the indicator 
dial the number of miles corresponding to the time the tire was on the 


machine. . | 
Now to obtain the population parameter in which we are interested— 


INTRODUCTION TO SAMPLING THEORY 2 37 


that is, the mean number of miles of service—we must obtain such a score 
for every unit in the entire population. The value of the parameter, then, is 
the mean of these scores. This implies exposing to machine wear until blown 
out each and every tire of the specified size produced by the particular 
manufacturer involved. But then there would be no tires to sell. Obviously 
the value of the parameter in this situation can never be determined prac- 
tically, and hence we are forced to be content with an estimate based on a 
sample of units taken from the population. In other situations reasons 
dictating the use of a sample estimate may include (a) the fact that the 
population is so large—i.e., consists of so many units —that it is physically 
and/or economically impracticable to obtain the scores for all the popula- 
tion units, and (b) the fact that the real population units may actually be 
nonexistent at the time of the investigation (see under Sample above the 
example of the experiment designed to evaluate the relative effectiveness 
of two methods of teaching a particular arithmetic skill). 


Slatistic 


A statistic is a sample fact which depends upon the scores of the particular 
sampling units comprising a sample. That is, just as parameter was the 
name given to a population fact, so is statistic the name given to a sample 
fact. In our illustration regarding the wearing qualities of tires, a. value 
of the statistic corresponding to the parameter deseribed could be deter- 
mined by selecting some number of units (tires) as a sample, by obtaining 
the scores for the units thus chosen, and then by finding the mean of these 
scores. 

Now if there is one thing about which we may be certain it is that the 
units (tires) that comprise our population vary in durability. No matter 
how the manufacturer may have striven for uniformity of quality the fact 
remains that some of his tires will wear better than others. ‘This being the 
case it is clear that the value of the statistic in this example will depend 
upon the quality of the units chosen for the particular sample. If these 
units are on the whole more durable than usual, their mean (the statistic) 
will be large in relation to the population mean. If they are less durable, 
it will be small. Moreover—and this point is important—if a second sample 
of units were to be selected from this same population, and the scores of 
these units determined, it is extremely unlikely that the distribution of the 
values of these new scores would be identical with that of the scores of the 
units comprising the first sample. Hence the statistic based on the second 
sample would almost certainly differ in value from the previous statistic. 
Thus while a particular parameter can have one and only one value, the 
corresponding stat istic is capable of assuming many different values. In 
the case of any given population, then, the value of a parameter is a constant 


while that of the corresponding statistic varies for different samples selecte 
from this population. 


238 


INTRODUCTION TO SAMPLING THEORY 


Sampling Error 

Sampling error is simply the difference between the value of a population 
parameter and that of the corresponding statistic. So that the direction of the 
error may be taken into account, this difference should always be deter- 
mined in the same way. The conventional procedure consists of subtracting 
the value of the parameter (0) from that of the statistic (S)—that is, where 
E represents sampling error, 


E-S— (9.1) 


This convention identifies sampling errors associated with under- 
estimates of the parameter as negative errors and those associated with 
overestimates as positive errors. 


Sampling Distribution 

We have noted that the value of a statistic (i.e., of a particular sample 
fact) may be expected to vary from one sample to another even if the 
samples are selected by the same procedure from the same population. Let 
Us suppose that by means of some prescribed procedure we select. а sample 
of, say, 100 units from some population, that we obtain some score for each 
unit selected, and that we compute the value of some statistic for this 
sample, as, for example, the mean of these 100 scores. Now suppose we 
repeat this process again and again, in all a total of, say, 1,000 times, cach 
time selecting by the same procedure a new sample of 100 units from this 
same" population, and each time determining the mean of the 100 scores 
obtained for the selected set of sample units. The 1,000 statistics (means) 
thus obtained will, of course, vary somewhat from sample to sample. Now 
let us organize these 1,000 statistics into a relative frequency distribution. 
We have in this distribution a "start" toward the empirical derivation of a 
particular sampling distribution, i.e., the sampling distribution of the means 
of 100 scores obtained for samples of 100 units, each sample being selected 
according to the same prescribed procedure from the same population. 
Actually we can only claim to have made a start toward the empirical 
; ribution because the notion in- 


derivation of this particular sampling dis 
volves the relative frequency distribution of the infinity of statisties (means) 
Which would arise from an infinity of repetitions of this particular sampling 
routine. It represents a collection of the totality of all possible experience 
With variation in the values of this particular statistic which arise from the 
repeated application of the particular sampling procedure to the particular 


y large” (i.e., infinite), the removal of the sample units will 
hot alfeet its character fore, that each new sample is se- 
lected from the same population as its pred ‚ even though the units previously 
selected were not returned to the population. If the population is not large it will, of 
ry to return to it the units selected for any given sample before sc lecting 
mple in order to satisfy the condition that each sample be selected from 


*T the population 


and it can be assumed 


course, be ne 
the succeeding 
the same population. 


INTRODUCTION TO SAMPLING THEORY 239 


population. A sampling distribution is, then, a theoretical construct. We 
may be able to set up a model of one but we could never empirically derive 
one. 

It is important to note that although we have discussed the concept in 
terms of sample means, the notion is one which may be extended to any 
statistic. Thus, if the medians of each sample of 100 scores had been ob- 
tained, we could have made a similar start toward the empirical deri vation 
of a sampling distribution of medians. In the same way it is possible to 
conceive of sampling distributions of semi-interquartile ranges, or of stand- 
ard deviations, or of percentages as, for example, percentages of individuals 
in the samples who indicate their intention to vote for a particular candidate 
for public office. 

We are now ready to state a somewhat more formal definition of a 
sampling distribution. The sampling distribution of a statistic is the r lative 
frequency distribution of an infinity of determinations of the value of this 
statistic, each determination being based on a separate sample of the same size 
and selected independently but by the same prescribed procedure from the 
same population. 


Bias 


A sampling distribution like any ordinary frequency distribution may 
be described (1) in terms of its placement along the seale of possible values 
of the statistic (i.e., in terms of its average); (2) in terms of the extent tO 
which the values are spread or dispersed along the seale (i.e, in terms of 
its variability); and (3) in terms of its symmetry, or skewness, OF peaked- 
ness, or flatness (i.c., in terms of its form). 7f the mean of the sampling dis- 
tribution of a statistic coincides with or equals the corresponding popula- 
tion parameter, il (the statistic) is said to be unbiased. If, on the other hand, 


the mean of its sampling distribution does not coincide with the parameter, 
it is said to be biased. 


It is important not to confuse bias and sampling error. Sampling error 
refers to the difference between the value of the statistic for a particula? 
sam ple and that of the corresponding parameter. Bias, on the other hand, 
does not refer to a particular sample result but rather to the difference 
between the parameter and the mean of the results (values of the statistic 
of an infinity of such samples. In other words, bias refers to the over-all oF 
long-run tendency of the sample results to differ from the parameter in a 
particular way. Obviously, the presence of bias in a sampling investigation 
is a thing cither to be avoided or fully taken into account. 

There are two ways in which bias may arise. The most {troublesome 
way is as a result of the method of sample selection. Suppose, for example: 
that the procedure used in selecting the sample in the automobile tire 
sampling illustration given above somehow resulted in tires which tended 


240 


j zm 
INTRODUCTION TO SAMPLING THEO! 


to be more durable than usual. "Then the mean of the sampling distribution 
of the statistie* will be larger than the parameter, the bias here being due 
to the sampling procedure. To say that the procedure fends to produce 
samples of tires that are more durable than usual is not to say that this is 
true of each and every sample. Some samples produced by the procedure 
may involve tires whose average durability is the same as that of the pop- 
ulation. The sampling error in the case of such samples is, of course, zero. 
Occasionally the procedure may result in tires having an average durability 
which is even lower than that of the population. The sampling error in the 
case of such samples is in an opposite direction from the bias. This illus- 
trates why the term bias is not applicable to the result of a single sample. 
Bias refers, instead, to long-run tendency as reflected by the average (mean) 


outcome of an infinity of samples. 

Bias due to method of sample selection is troublesome because there is 
10 Way to assess its magnitude and consequently there can be no way to 
make due allowance for it or to take it into account in interpreting the 
sample results. Designers of sampling procedures do not purposely try to 
introduce bias. In fact, unless they are completely dishonest and seeking 
to practice deceit, they will take every precaution to avoid it. But bias in 
sampling procedure can be extremely subtle and may escape entirely the 
notice of the sampler until, too late, he is brought up short by some incon- 
sistency in his results. To design sampling routines that are free of bias is 
hot always an easy undertaking. Some attention is given this problem in 
the following section. 

The second source of bias is a less troublesome one. It has to do with 
the character of the statistic itself. Some statistics are of such a nature 
that the means of their sampling distributions will inherently differ from 
the corresponding population parameter in spite of the fact that the sam- 
pling procedures involved are free from bias. The sample range, for exam- 
ple, could never possibly exceed the population range—it could at most 
equal it, and then only if both the smallest and largest population values 
happened to be involved in the particular sample. Hence, the mean of a 
sampling distribution of ranges is bound to be smaller than the population 


range so that the sample range illustrates an inherently biased statistic. 
Bias inherent in a statistic is not a troublesome problem like bias arising 
from a sampling procedure because it is possible to deduce mathematically 
its direction and magnitude. When the direction and magnitude of a bias 
are known it is a simple matter to make allowance for it in interpreting 
results. 
ےد‎ each 
*The statistic in this example is itself a mean, namely, the mean of the numbers of 
miles of use the tires in a sample will give before blowout occurs. The mean of the 
sampling distribution, on the other hand, is the mean of an infinite collection of such 
sample means., 


INTRODUCTION TO SAMPLING THEORY 241 


Standard Error : Т 
We have already discussed briefly the use of measures of л s 
indexes of the reliability of a measuring or sample estimating parus 
(see Section 6.8). In so doing we pointed out that neither errors oi | aie 
urement nor sampling errors could ever be determined quantitative у "M 
asmuch as their determination would require knowledge of the ed Fun 
(parameter) being measured or estimated. W e suggested that а x d е 
consistency of the results arising from repetition ol a given "ei i Fg 
sampling procedure would provide a useful basis for evaluating a : x 
bility or accuracy of that procedure. In keeping with thie spite pee 
shall use as a quantitative index of the accuracy of a sampling pm 2 
the standard deviation of the sampling distribution of the statistic (6 et 
volved, or what is the same thing, the standard deviation of the n 
error distribution. Since this standard deviation is used as an index Med 
degree of reliability with which a parameter may be estimated by a М EM 
derived from repeated application of a particular sampling e s of 
since it may be thought of as the standard deviation of the distribu i p 
sampling errors it is called a standard error. That is, the standard err 
any statistic is the standard deviation of its sampling distribution. ui 
Of course, since the sampling distribution of a statistic is a Mg ens 
construct, the standard error of a statistie must also be a theoretica ied 
struct. We can, however, estimate its value for any statistic derived “i se 
& specified sampling procedure by actually effecting some number of aa 
titions of this procedure. Here we are in effect regarding the sampling e A 
tribution as a hypothetical population of values from which we vex hie 
sample by repeating a specified sampling routine and determining the v a 
of the statistic for each repetition. The values of the statistie which ae 
prise this sample may then be used to estimate a particular eiie 2 
the standard deviation—of this hypothetical population (the sampling ¢ a 
tribution). As we shall later learn, it is possible in the case of certain маш 
tics based on samples selected in a certain way to obtain useful estimates " 
their standard errors from the information contained in a single samp! 
thus saving the necessity of possible costly repetition of a sampling routine 


9.3 SELECTING THE SAMPLE 

There are many ways in which a sample may be selected. In the 
ample about the automobile tires for instance, we might simply go to uae 
company stock pile and take from it the needed number of most a 
veniently accessible tires, Or we might instead go to the end of the ріал“ 
production line and take the needed number of tires in succession as they 
come off the line. The usefulness of these or any other procedures depen’ 
upon the effectiveness with which the resulting samples represent the рор” 
ulation involved. Both Procedures cited, for example, ignore such tires 2? 


THEORY 
INTRODUCTION TO SAMPLING THEO 


may have been in the possession of retail dealers for an appreciable period 
of time. In other words tires thus chosen represent only the more recently 
manufactured portion of the population. If the wearing qualities of a tire 
are in any way a function of recency of manufacture then samples chosen 
according to the above procedures will be biased in the direction of the 
qualities which are characteristic of only the more recently manufactured 
tires. 

The task of devising procedures of selection that will result in samples 
which are free from bias, as we have already pointed out, is extremely dif- 
ficult and subject to subtly concealed sources of error. There is no sub- 
stitute for a soundly conceived plan. Not even the use of an extremely 
large sample can be counted upon to mitigate the bias arising from an in- 
valid sampling scheme. In 1936, for example, the editors of a weekly news 
periodical known as the Literary Digest undertook to forecast the outcome 
of the presidential election of that year. They put their faith in sample 
size, believing, it would seem, that if a sample were simply made big enough 
the manner of its selection would be immaterial. They obtained straw 
ballots from some 10,000,000 people using telephone directories as the pri- 
Mary source of names. This procedure not only ignored non-telephone sub- 
scribers but resulted in the inclusion in the sample of disproportionate 
numbers in the older age groups. Since the issues of the 1936 campaign 
Were drawn largely along economic lines, it is not surprising that the fore- 
cast based on this sample was a victory for Landon. Roosevelt's sweep of 
all states save Maine and Vermont and the Literary Digest's subsequent 
failure are a matter of record. 

But procedural errors in selecting samples are not always so obvious. 
Even the foregoing example has been oversimplified and incompletely re- 
ported. It is, perhaps, unfair to imply that the conductors of the Literary 
Digest poll of 1936 were blind to the possibility that their technique of 
sample selection would result in the inclusion of a disproportionate num- 
ber of individuals favoring the economie philosophy of the Republican 
party. Besides their faith in the extreme size and the widespread distribu- 
tion of their sample (names were selected from every telephone book in the 
United States), they could point with pardonable pride to the past success 
of their technique. In 1932, for example, another election year in which 
economie issues were paramount, the same sampling technique produced a 
phenomenally accurate forecast of the outcome of the presidential election. 
How is it possible, then, that a scheme which proved so satisfactory in fore- 
casting one election failed so miserably in another? Post-mortem analysis 
provided an answer. Unlike modern polls which make use of interviewers, 
the Literary Digest poll was conducted through the mails, and this pro- 
cedure, of course, leaves the return of the ballot to the whim of the re- 
cipient. It is now known, perhaps as a form of voicing protest, that mem- 
bers of the party out of power are far more likely to return such ballots 


INTRODUCTION TO SAMPLING THEORY 243 


than are members of the “ins.”* In 1932, therefore, a far greater propor- 
tion of the then out-of-power Democrats receiving Literary Digest ballots 
returned them than did the Republicans receiving these ballots. Thus the 
bias resulting from the use of the phone directory as a primary source of 
sampling units was canceled by the opposite bias resulting from the use of 
the mails in collecting the straw ballots. In 1936, on the other hand, the 
then out-of-power Republicans returned the ballots in greater proportion 
and the two sources of bias, instead of canceling each other out, became 
additive. The bias which was due to allowing an individual to decide for 
himself whether or not he is to be included in the sample is one that most 
present-day designers of sampling studies seek to avoid. It is not uncom- 
mon, however, particularly in the case of questionnaire studies, to find this 
method of sampling extant today, which explains in part the skepticism 
with which the results of such studies are generally regarded. To the 
operators of the Literary Digest poll this particular source of error (bias) 
was apparently unknown and, while obvious enough once pointed out, it 
illustrates the subtlety of sources of bias against which the sampler must be 
continually on guard. 

In general sampling schemes may be classified according to two types: 
(1) those in which sample elements are automatically selected by some 
scheme under which a particular sample of a given size from a specified 
population has some known probability of being selected; and (2) those in 
which the sample elements are arbitrarily selected by the sampler because 
in his judgment. the elements thus chosen will most effectively represent 
the population. Samples of the first type are known as probability samples 
while those of the second type are referred to as judgment samples.t Of 
these two general types of sampling procedure only the first is amenable 
to the development of any theory regarding the magnitudes of the sampling 
errors which may be expected in a given situation. 

In this book we shall confine our attention to a special case of proba- 
bility sampling known as simple random sampling. Simple random sampling 
refers to а method of selecting a sample of a giren size from a given population 
in such a way that all possible samples of this size which could be formed from 
this population have equal probabilities of selection. For example, suppose the 
population to consist of only five elements named a, b, с, d, and e. It is 
possible to form ten samples of two elements each from this population so 
that the probability of any sample in this universe of samples is .1. The 
ten possible samples are: 


1. ab 3. ad 5. bc 7. be 9. ce 
2. ac 4. ae 6. bd 8. cd 10. de 


*J. D. Cahalan, Literary Digest Presidential Poll, Unpublished Master's Thesis, State 
University of Iowa, 1936. 


TE. W. Deming, Some Theory of Sampling (New York: John Wiley & Sons, Inc., 1950). 


244 INTRODUCTION TO SAMPLING THEORY 


Next we must prescribe some procedure for selecting one of these samples 
such that if the procedure is repeated an infinity of times each of these 
possible samples will occur with the same relative frequency in the new 
hypothetical universe thus generated—a new universe representing the 
totality of all possible experience with this sampling procedure in this 


situation (see Section 8.6). 

We might, for example, write each sample identification number on 
one of ten identical cards or slips of paper, place them in some container, 
mix them thoroughly, and then, blindfolded, draw опе of the cards or slips 
of paper from the container. Since, with repetition of this procedure we 
would expect each sample to be drawn one-tenth of the time in the long run 
(i.e., in the infinity of repetitions) the resulting sample would be, by defini- 
tion, a simple random sample. 

Actually, to draw a simple random sample, it is not necessary to iden- 
tify all possible samples as in the above example. It is sufficient to identify 
the elements in the population and then, as a first step, to draw a single 
element by a procedure of the character just suggested. The single element 
thus chosen is then, by definition, a simple random sample of one case or 
object taken from the given population. The element thus chosen is set 
aside as the first member of the sample to be drawn. The process is then 
repeated with what is left of the population. "That is, from the new popu- 
lation, which differs from the original only in that it lacks the element just 
drawn, a second element is chosen by this same procedure. This element 
is also set aside as a member of the sample desired. This process is repeated 
until a sample of the desired size is attained. While we shall not attempt 
here to detail the argument involved, it can be shown that this procedure 
fully complies with the definition of simple random sample previously 
Stated and that the probability associated with a sample selected from a 
given population by this latter procedure has the same numeric al value as 
that of a sample selected by the procedure which requires the identification 
Of all possible samples. 

To effect the procedure just suggested we must not only assign some 
identifying number to each population element but we must also prepare 
for each element a corresponding card or slip of paper bearing this number. 
Except for this number, these cards must be made as nearly identical as 
Possible in order to avoid any effect that physical differences in the cards 
might conceivably have upon the long-run frequency with which some 
would he selected. Obviously the task of preparing such cards can become 
a tedious one. The practical difficulties associated with it mount as the 
To cireumvent this task, tables of random 


Population becomes large. 
numbers have been developed to take the place of cards or slips of paper. 

A table of random numbers is simply a large collection whose elements 
are the ten digits. Such a table is constructed by a method which, if 
continued indefinitely, would generate a universe in which the ten digits 


INTRODUCTION TO SAMPLING THEORY 245 


would not only occur with equal frequencies (i.e., have equal probabilities) 
but would also be arranged in a random order. We shall not attempt here 
to define random order. It implies that if V digits are read successively 
either by rows or by columns from any arbitrarily selected starting point 
in the table, the № digits thus read would correspond to a simple random 
sample from the universe. A small table of random numbers is given in 
Appendix C, pages 512-517. 

To use a table of random numbers to select a simple random sample of, 
say, 25 objects from a universe of, say, 1,000 objects, it is first necessary to 
identify each object in the universe by the successive numbers in the series 
000, 001, 002, ..., 999. Then choosing any three columns (or rows) of the 
table—it is usually most convenient to use successive columns ~and arbi- 
trarily selecting any row of these columns as a starting point, record the first 
25 successive rows of three digits appearing in these columns. Now take 
as the sample from the universe the 25 objects whose identification numbers 
correspond to the 25 numbers thus recorded. It may be necessary to 
record more than 25 numbers if it develops that some of the numbers 
recorded are the same. 

While this technique saves the preparation of cards and the invention 
of some scheme for mixing and drawing from among them, it does not 
eliminate the task of numerically identifying cach element of the popula- 
tion. This in itself may be a practical impossibility either because of the 
size of the population or because of the inaccessibility or current non- 
existence of some of its elements. In such situations either all or some of 
the available elements may be used as a sample. When only some are used, 
these should be selected at random from those available. Although such 
samples do not comply with our definition of simple random sample, they 
are often treated as such. That is, they are simply assumed to be random 
samples of the population or universe involved. When this is done it clearly 
becomes the responsibility of the investigator to defend the validity of this 
assumption. 


9.4 SAMPLING THEORY As Ir APPLIES TO THE MEANS OF 
RANDOM SAMPLES 


By sampling theory as it applies to some statistic, we refer to the nature 
and characteristics of the theoretical sampling distribution of this statistic. 
In other words we refer to a description of the theoretical totality of experi- 
ence with the values of this statistic which arise when a given sampling 
scheme is repeatedly applied to a given population. The development of 
such theory is the work of the mathematical statistician who is often 
forced to employ advanced mathematical procedures to accomplish this 
purpose. Throughout this text we shall limit our treatment of sampling- 
error theories as they apply to selected statistics to a description of the 


246 


INTRODUCTION TO SAMPLING THEORY 


mathematician’s findings without any attempt at presenting the mathe- 
matical bases. We shall, moreover, confine our attention to the theory 
as it has been developed for infinitely large populations. This is not as 
restricting as might be presumed. In the first place, unless the popu- 
lations are quite small and the sample so large as to take in a substantial 
portion of the population there is very little practical difference in the 
theory for finite and infinite populations. In the second place, such errors 
as will occur in estimating the reliability of a sampling routine applied 
to a finite population will tend to be on the conservative side. That is, 
estimates of standard error based on the theory developed for infinite 
populations will tend to be too large when this theory is applied to finite 
populations. Finally most of the populations of concern to psychologists 
and educators are either quite large or are hypothetical in character. If 
the latter is the case, it is usually logically defensible to extend the 
hypothetical notion of the population to make it very large if not infinite. 

In this section we shall be specifically concerned with the sampling 
theory which has been developed for the mean of a simple random sample. 
While some concrete illustrations are cited, consideration of the practical 
applications of this theory is deferred to the two subsequent chapters. 

We shall consider first the case in which the population of scores (X’s) 
involved is normally distributed.* We shall represent the mean and 
variance of this population of scores by и апа c? respectively. Now let a 
simple random sample of № scores be selected from this population and let 
the mean of these № scores be represented by X. Then the mathematical 
statisticians have rigorously demonstrated the fact that were this sampling 
procedure to be repeated an infinity of times the resulting infinite collection 
of X-values would also be normally distributed with mean, и, i.e., with a 


mean having the same value as that of the population from which the 


Samples were selected. M. ; е 
Intuitively it would seem that the variability of this theoretical collec- 


tion of X-values should depend (1) upon the variability of the scores com- 
prising the population, and (2) upon the size of the sample, N. The greater 
the variation among the population scores the more variation we would 
naturally expect to observe among the X-values. On the other hand, a 
large sample provides a more precise estimate of и than a small one, so that 
the larger the value of № the less variation we would expect to observe 
among the X-values. In other words, we would expect the degree of varia- 
tion among the X-values to be directly proportional to the degree of varia- 
tion among the population scores and inversely proportional to the size of 
the sample. Mathematical statisticians have in fact shown that the vari- 
ance of the X-sampling distribution (c?x) is directly proportional to the 


—— — 


] nine-year-old Canadian boys: or the intel- 


*For exa e heights in inches of al 
imple the Ше ar-old girls in the state of New York. 


ligence scores in mental-age units of all ten-ye: 


INTRODUCTION TO SAMPLING THEORY 247 


variance of the population (7®) and inversely proportional to the size of 
the sample (№). 

By way of summary we shall express the foregoing theory in the form 
of a rule. 


Кете 9.1. The sampling distribution of means (X) of simple random 
samples of N cases taken from a normally distributed population with mean ш 
and variance т? is a normal distribution with mean p and variance 


(9.2) 


Ко 9.la. The standard error of the sampling distribution of Rule 9.1 is 


(9.3) 


By way of concrete illustration consider the theoretical population of 
height scores of nine-year-old Canadian boys which is pictured by the nor- 
mal distribution shown in Figure 8.17. Then for random samples of 100 
height scores selected from this population the sampling distribution of the 
statistic Y should be a normal distribution with mean at 51.7 and a variance 
of (2.35)2/100 = .0552, or a standard error of 5 (sec 
Figure 9.1). From this distribution we may note, for example, that in the 


51.0— 51.244 51.5— Y 51.9+ 522— 524+ 
_ =#=517 
X-Scale (Inches) 


Ficure 9.1 Sampling distribution of means of samples 
of 100 cases selected at random from a normally distributed 
population having u = 51.7 and с = 2.35 


long run 68.26 per cent of the means of random samples of 100 cases selected 
from this population will involve sampling errors of less than .235 inches; 
or that the probability of a sample me 
1.96 X .235) or more is 0.05. 

It should be obvious that the applicability of the foregoing theory 
must necessarily be quite limited, à 


un being in error by AG (iC 
inasmuch as we may expect to find 


248 INTRODUCTION TO SAMPLING THEORY 


relatively few of the “real world” populations in which we are interested 
to be normally distributed. Fortunately there exists a useful theory which 
has a much more extensive range of applicability. This theory is the 
subject of Rule 9.2. 


Renk 9.2. The sampling distribution of means (X) of simple random 
samples of N cases taken from any (infinite) population having mean u and 
finite variance т? approaches a normal distribution with mean u and variance 


0?, N аз N increases. 
Reve 9.2a. The standard error of the sampling distribution of Rule 9.2 is 


(9.4) 


This theory, which belongs to the class of theories labeled “large- 
sample theory,” differs from that previously stated in that it is applicable 
to any (infinite) population whatever, regardless of the form of the score 
distribution, so long as the variance of this population is finite. Since al- 
Most any population in which we are likely to have a practical interest will 
have a finite variance, the theory becomes almost completely general in its 
applicability. Nevertheless this theory leaves something to be desired. 
Its shortcoming lies in the fact that it is extremely difficult to say just how 
large N must be in order for the normal distribution to provide a sufficiently 
accurate model. If the population distribution is nearly normal—say, bell- 
shaped—the theory is sufficiently accurate even when V is quite small. If, 
on the other hand, the population distribution is far from normal—say, J- 
shaped- a much larger V is necessary to justify the application of the nor- 
mal approximation. Empirical investigations have shown that for most 


of the populations encountered № = 50 is sufficient to warrant the use of 


this theory. 


0.5 SAMPLING THEORY as IT APPLIES TO THE MEDIAN ОЕ 
RANDOM SAMPLES 


Reng 9.3. The sampling distribution of medians (тап) of simple 
random samples of N cases taken from any continuous (infinite) population 


having median E approaches а normal distribution with mean E as N in- 


creases. 
Ree 9.4. If tz is the ordinate of the population probability distribution 
curve at the median (È) the variance of the sampling distribution of Rule 


0.3 ds 


"bt ka 


own to statisticians as the Central-Limit Theorem, 


s important result is 


249 


INTRODUCTION TO SAMPLING THEORY 


1 > 
UN CR 9.5 
O „ап TZN ( ) 


Rute 9.4a. The standard error of the median is 


1 à 
= —— 9.( 
O ndn MN (9.6) 


Rere 9.4b. If the population is normally distributed with standard 
deviation с the standard error of the median is 


(9.7) 


Proof. If the population distribution is as specified by (8.1) then 


and direct substitution into (9.6) gives (9.7). 

This theory is very similar to that which applics to the mean. The 
remarks made at the close of the foregoing section with regard to sample 
size apply here as well. It will be observed that the theory does not require 
that the population variance be finite as does that expressed in Rule 9.2. 
On the other hand, it is limited to use with scores representing measures of 
continuous attributes, whereas the theory pertaining to the mean is appli- 
cable to both diserete and continuous data. It is also important to note that 
when the population involved is normally distributed, the sample mean is 
a more reliable estimate of и than is the sample median, the standard error 
of the median being approximately one and one-fourth times larger than 
that of the mean (9.7). It is for this reason that the mean is usually pre- 
ferred over the median as the statistic in sampling studies having to do with 
the characteristic of central tendency. 

To illustrate this theory we shall again make use of the theoretical pop- 
ulation of height scores of nine-year-old Canadian boys shown in Fig- 
ure 8.17. In the preceding section we saw that the sampling distribution 
of X for random samples of 100 taken from this population was normal with 
mean at 51.7 and a standard error of .235. Since the median of this popu- 
lation of height scores is also 51.7 it follows from Rule 9.3 that the sampling 
distribution of the median for random samples of 100, taken from this 
population, is also a normal distribution with a mean (or median) of 51.7- 
Moreover, since the population from which the samples are taken is itself 
normally distributed, it further follows from Rule 9.4b that the standard 
error of this sampling distribution is approximately 1.25 x .235 = .294- 


250 


INTRODUCTION TO SAMPLING THEORY 


This theoretical sampling distribution is pictured in Figure 9.2. From this 
distribution we may note, for example, that the probability of a sample 
median being in error by .576 (i.e., 1.96 X .294) or more is 0.05. 


9 mda 7.294 
— —À. 

T T T cu T 
50.8+ 511+ 51.4+ Y 520— .523—  52.6— 
&=517 
mdn-Scale 


Figure 9.2 Sampling distribution of medians of samples 
of 100 cases selected at random from a normally distributed 
population having u = 51.7 and о = 2.85 


9.6 SAMPLING THEORY AS Ir APPLIES TO А PROPORTION 


Consider a population consisting of only two types or classes of objects 
or individuals (units). For example, the population may consist of just 
two types of fourth-grade pupils—those who can spell a given word and 
those who сап not, or those who correctly answer a particular test question 
or those who have had mumps and those who have 


and those who do not, 
onsist of voters who vote for Candi- 


hot, ete. Or the population may ¢ 
date A and those who do not, or of United States citizens who are church 
members and those who are not, or of teen-agers who are delinquent (a 
definition of delinquency is, of course, necessary) and those who are not. 
Populations of this type, that is, populations whose units may be classified 
into one or the other of two mutually exclusive classes, are known as dichotomous 


populations. 
Suppose that f 
portion of units belonging 
for us to examine all of the units comprising the population. We can obtain 
an approximation of the value of the desired proportion by selecting a 
sample from the population, counting the sample units belonging to this 
class, and expressing this count as a proportion of the number of units in 
the sample. It should be recognized, of course, that this sample proportion 
may involve a sampling error and that a repetition of the sampling pro- 
cedure would almost certainly yield a proportion which would differ from 
that of the first sample. In fact, all the sampling-error concepts which we 
have thus far developed may be applied to the sample proportion considered 


or some such population we wish to determine the pro- 
to one of the two classes, but that it is impractical 


INTRODUCTION TO SAMPLING THEORY 251 


ах a statistic. We shall state the sampling error theory as it applies to a 
proportion in the form of a rule. 


Rene 9.5. Giren an infinite dichotomous population, the units of which 
either do or do not belong to Class A. Let p represent the proportion of Avs in 
this population. Then, as N increases, the sampling distribution of the pro- 
portion (p) of A-type units in random samples of N units taken from this 
population approaches a normal distribution with mean ф and variance, 


(9.8) 
Rute 9.5a. The standard error of the sampling distribution of Rule 9.5 is 


$09) Ф) (9.9) 


Op = 


As an example of this theory, consider a population of school pupils 
40 per cent of whom can solve a given test exercise correctly. Here these 
pupils are the A's and @=.4. Then for random samples of 600 pupils 
taken from this population, the sampling distribution of p, the proportion 
of A’s in a sample, is a normal distribution (approximately) with mean at 
-4 and standard error .02 (i.e., c, = V(.4)(.6)/600 = .02). This theoretical 


sampling distribution is pictured in Figure 9.3. From this distribution we 


Ф —.40 
p-Scale 


Ficure 9.3 Sampling distribution of a proportion (p) for 
random samples of 600 units selected from a dichotor 
population containing .4 A's 


NOUS 
may note, for example, that the probability of a sample being in error by 
.052 (i.e., 2.58 x -02 = .052) or more is approximately 0.01. 

Actually this theory is a special application of the theory of Rule 9.2 
and Rule 9.2a, for if we assign the score of one to population units classified 
as A's, and zero to units which are not A's, then ф and p are respectively 


202 


INTRODUCTION TO SAMPLING THEORY 


the population and sample means, and ¢(1 — Ф) is the population variance.* 
Rule 9.2 applied to these facts leads to Rule 9.5. 

was true of the theory of Rule 9.2, that of Rule 9. 
shortcoming that it is difficult to say just how large № must be in order for 
the normal distribution to provide a sufficiently accurate model. If the 
value of @ is near .5 (say between .4 and .6), so that the population distri- 
bution is not too asymmetrical, then a fairly accurate approximation is 
provided by the normal distribution with samples of 100.1 On the other 
hand, if @ differs considerably from .5, extremely large samples (1,000 or 
More) are necessary to justify the use of this approximate theory. In fact, 
if ф is near one or zero a special theory not considered in this text may need 
to be employed.f It is clear from these remarks that requirements regard- 
ing sample size are much more stringent in the case of the proportion as the 
Statistic than in the case of the mean. 


subject to the 


9.7 SAMPLING THEORY As Ir APPLIES TO DIFFERENCES BETWEEN 
Two NORMALLY DISTRIBUTED RANDOM VARIABLES 


Suppose that we have given two normally distributed populations of 
scores (А) designated as Population 1 and Population 2. Let the means 
and variances of these two populations be represented by ш, and о?у, and 
Mz and e». Now, suppose a single score is selected at random from each of 
these populations. Let the difference Yı — № between these two scores 
be represented by D. Repetition of this procedure would lead to a collection 
ee 
"To see that ¢ and ф(1 — Ф) are the mean and variance of such a population con- 
sider the table and calculations below. 


Classes Score (X) rf GDX (rf) X? 
A 1 Ф Ф Ф 
Not A 0 1— Ф 0 0 
Totals 1 Ф Ф 


Now using (5.3) or (5.16) we obtain 


t Actually it would be safer to adopt 400 as the minimum sample to which this theory 
should be applied. It is possible to improve the accur: of 1 he approximation provided 
hy this theory by the application of a correction known as Yates Correction. Discussion 
of this correction is bevond the scope of this text, but the student wishing to apply this 
theory to samples as small as 100 would do well to investigate the nature of this correc- 
tion. For certa ses this correction may even serve to relax the restrietion on 
Sample size to possibly 50 provided ф is no less than .4 or more than .6, 

iln such с a distribution, known as the Poisson distribution, may provide a bet- 
ler approx ion of the sampling distribution than the normal distribution, The 


Poisson model is not treated in this text. 


INTHODUCTION TO SAMPLING THEORY 2 5 3 


of D-values and an infinity of such repetitions would lead to a sampling 
distribution of D's. Mathematicians have demonstrated the following 
facts regarding this sampling distribution. 
1. It is a normal distribution. 
2. It would have a mean equal to the difference between the means of 
the two populations. That is, it would have the mean 


Hp = ш — Be 
3. It would have a variance equal to the sum of the two population 
variances. That is, 
0?p — 97, +072 


We shall summarize this theory as a rule. 


Rv ve 9.6. Given two normally distributed independent* random variables 
X; and X» having respective means pı and ua and variances с? and 07». 
Then the sampling distribution of D = X1— X» is a normal distribution with 
mean uj — u» and variance с? +o 


The importance of this theory is that it in turn provides us with a very 
important sampling theory regarding the difference between the means (or 
medians, or proportions) of two random samples selected independently 
from two populations. Let us approach this latter theory in terms of à 
more concrete situation. 

Suppose that we are interested in investigating the difference in mean 
spelling ability, as measured by the score on some spelling test, of a popula- 
tion of school children taught spelling by some new method and a popula- 
tion of similar school children taught by some more traditional procedure. 
For convenience we shall designate these populations as Populations 1 and 
2 respectively. Since the means of these populations are unknown to us 
we shall simply represent them by ш and pe. That is, the difference we 
wish to investigate is the difference Ш — ua. 

Now suppose we select from Population 1 а random sample of 50 chil- 
dren for whom the mean score on this test (X1) is 72. Also, suppose that for 
à random sample of 60 children taken independently from Population 2 the 
corresponding mean score (X2) is 65. Now while the difference Y, — X? 
= 72 — 65 = Т gives us some indication of the difference Hi — Me, we recog- 
nize that it may involve a certain amount of sampling error. That is, We 


*At this point in our exposition it is not possible to define precisely the word inde- 
pendent in terms that would be meaningful to the student. Perhaps, the word unrelate 

should have been used as possibly being intuitively more meaningful to the student than 
the word independent. Relationship would occur if the selection of a large X tended to 
be accompanied by the selection of a large Xs, or the selection of a small T accompaniec 
by the selection of a small X2. Actually, at this point it is sufficiont for ilia student to 
know that if both values are selected at random from their respective distributions the 


condition of independence necessary for the rule will be satisfied 


254 


INTRODUCTION TO SAMPLING THEORY 


know that were we to repeat the procedure, taking respectively a new pair 
of independent random samples of 50 and 60 cases from these populations, 
the new № = Хә difference would almost certainly differ from that pre- 
Viously obtained owing to chance variation in the composition of the two 
sets or pairs of samples. Now by Rule 9.2 we know X ; and V2 to be normally 
distributed with variances т? 50 and т?» 60. Hence, Rule 9.6 is applica- 
ble and we know that the sampling distribution of the difference X, — 
is a normal distribution with mean ш — и>» and variance с? 750 + т?» 60. 
If we now knew the values of the variances of these populations (i.e., the 
values of 2; and 623), it would be possible for us to describe the sampling 
distribution for an infinity of such X, — Хә» differences. 

For the purpose of showing how this could be accomplished, assume 
that we somehow know the variances of these two populations to be 250 
and 240 respectively. Then by Rule 9.2, X; is a normally distributed 
variable with mean ш and variance 5 (since 250,50 = 5). Also by this 
same rule, Y» is a normally distributed variable with mean д» and variance 
4 (since 240/60 = 4). Hence, by Rule 9.6, X; — X» has a normal sampling 
distribution with mean ш — Me and a variance of 9 (since 5+ 4 = 9) or a 
standard error of 3 (since V9 = 3). This sampling distribution is pictured 
in Figure 9.4. The positive and negative values shown along the scale 


ox,-m=3 


Ficure 9.4 Ñı— X» sampling distribution 


represent distances above and below the “true” difference, ui — u». That 
is, the point + 3 actually represents a point 3 units above ui — и» on the 
Tı — Xo seale. Similarly, the point — 6 actually corresponds to the value 
G units Below (ш — и») on this scale. In this situation we may note, for 
example, that the probability of a given X1— difference being in error 
hy 5.9 (since 1.96 X 3 = 5.88) or more is approximately 0.05. 

We shall make a generalized statement of the sampling theory which 
we have just illustrated in the form of a rule. 


INTRODUCTION TO SAMPLING THEORY 255 


Rue 9.7. Let Xj represent the mean of a random sample of ny cases 
taken from any (infinite) population having mean p, and finite variance с?т, 
and let Xa represent the mean of an independently selected random sample of n2 
s from any other (infinite) population having mean ш» and finite variance 
Then, as ny and п» increase, the sampling distribution of X;,-—- 
approaches a normal distribution with mean ui — uo» and variance giren by 


case. 


(9.10) 


Ко 9.7а. The standard error of the sampling distribution of Rule 9.7 is 


(9.11) 


it is 


In a similar fashion through the application of Rules 9.6 and 9. 
possible to derive a sampling theory for the difference between sample 
proportions. We shall simply state this theory in the form of a rule with- 
out illustrative elaboration. f 


Reve 9.8. Giren dichotomous populations, 1 and 2, each consisting of 
A's and not-A’s. Let $1 and $2 represent the respective proportions of A's in 
these populations. Also let pi and ps represent the proportions of A's in inde- 
pendent random samples of ny and п» cases taken from populations 1 and 2 
respectively. Then as ny and n» increase, the sampling distribution of py — P2 
approaches a normal distribution with mean фу — ф» and variance given by 


с? ф\(1 — 1) | do(1— $2) (9.12) 


= — 
Di— pa n m 


Rete 9.8a. The standard error of the sampling distribution of Rule 9.8 is 


1— »(1 — do 
о Ч = Ф) , 2)1 = $2) (9.13) 


na 


9.8. APPROXIMATING DESCRIPTIONS OF SAMPLING DISTRIBUTIONS 


Except for the limited case of the sampling distribution of the mean for 
random samples from a normally distributed population (Rule 9.1) and for 


*To this point we have usually used an upper-case № to represent the number of score? 
ina collection or the number of cases in a sample. Hoye when the total collec” 
tion of scores may be viewed as cor sting of subsets of seores а shall use a lace ise 
n to represent the number of score: а subset. This leav a froe to use thé upper-ea 
N to represent the total number of scores in all subsets (Bon examaltecof ravi ae use of 
this scheme, see Sections 3.9 and 3.10 and Rule 5.1). pA 
regard the total collection of data at hand " 
two samples, 


ise 


In the present situation we may 
as consisting of two subsets, namely the 


flt should be observed that a similar theory e i 
він E E sory could also be stated with regard to the 
difference between sample medians. While no statement of this theory is жө in this 


text, the student may find it profitable to attempt such a statement as an exercise- 


256 


INTRODUCTION TO SAMPLING THEORY 


the situation of Rule 9.6, all the theoretical sampling distributions which 
we have presented in the foregoing sections are approximate in character. 
That is, all the sampling distributions, except these two, only fend toward 
or approach the normal-distribution model as the sample size increases. 
In each case, however, suggestions were made regarding the minimum 
sample size necessary to make the use of the theoretical model sufficiently 
accurate for practical purposes. 

There remains another aspect of all the sampling distributions as they 
have thus far been described (including those of Rules 9.1 and 9.6) which 
restricts their practical usefulness. This is the fact that the specification 
of any of these distributions in a particular case requires knowledge of 
certain. population facts (parameters). For example, specification of the 
distributions involving means implies knowledge of the means and the 
Variances of the populations involved, while specification of sampling 
distributions involving the proportions of A’s in dichotomous populations 
of A's and not-A's implies knowledge of these very proportions. Obviously 
knowledge of this type is not generally available, for if it were no sampling 


theory would be necessary. 

In spite of this it is still possible to make useful applications of these 
models. It is necessary, however, to accept further approximations, 
Namely, such approximations of the needed population parameters as can 
be derived from the information contained in the sample at hand. 

An indication of the value of a population mean, median, proportion, 
or of the difference between the means or proportions of two populations 
is readily obtainable from the sample or samples as the case may be. For 
example, we note from either Rule 9.1 or 9.2 2 that the mean of the sampling 
distribution of means of random samples is the same as the mean of the 
Population from which the samples come. Hence, the expected value of 
the mean of a random sample [E(X)] is the population mean.* That is, 
the mean of a random sample provides an unbiased estimate of the popula- 
tion mean (see definition of bias). Obviously similar conclusions apply in 
proportions and differences between means and pro- 
summary we have the following rules. 


the case of medians, 
Portions. By way of 

Rere 9.9. The following statistics derived from random samples selected 
From а given population provide unbiased estimates of the corresponding popu- 


latio 
n parameters: st 
X, mdn, and p 
A; 


Ree 9.10. The following differences derived from independent random 
Samples selected from Populations 1 and 2 provide unbiased estimates of the 
differences between the corresponding parameters: 


and pi — pe 


ec 08 


*See Sec "lion 5.13 for a discussion of expected value. 


257 


INTRODUCTION TO SAMPLING THEORY 


In addition to these estimates of location or central tendency, the 
specification of the approximate sampling distributions under consideration 
also implies the availability of an estimate of the variance or standard 
deviation of the populations involved. Unlike these averages, the expected 
value of the variance of a random sample is not the population variance. 
In other words the sample variance does not provide an unbiased estimate 
of the population variance. While consideration of the sampling distribu- 
tion of variances of random samples is beyond the scope of this text, one 
fact about this distribution is of great importance, namely, the fact that 
its mean is given by 
arg == oe (9.14) 
where M (8?) = the mean of the 8? sampling distribution 

т? = the variance of the population from which the sam- 
ples come, and 
N — the sample size 


Formula (9.14) shows that the mean value of the sampling distribution 
of random sample variances is somewhat smaller than the. population 
variance, since the factor (№ — 1)/N must necessarily always be less than 
one. For example, if N is 5, the mean of the distribution of sample variances 
is 4/5 of the population variance, and if У is 100 the mean of the sample 
variances is .99 of the population variance. It is also clear that as № in- 
creases, the magnitude of this bias decreases. Now we have previously 
learned (see Rule 5.4) that if each score in a collection is multiplied by 
some constant the mean of the new collection thus formed is equal to the 
mean of the original scores multiplied by this constant. Suppose now that 
instead of considering a distribution of 8?-values, we consider a distribution 
of values consisting of the product of each 82 times the constant N/(N — 1). 
Then the mean of this new distribution is equal to the mean of the 87 
distribution multiplied by thi 


ame constant. "That is, 


Hence, it follows that the population variance is the expected value of the 
statistic N8?/(N — 1). That is, N8?/(N — 1) is an unbiased estimate of 
the population variance e. We shall again summarize in the form of a rule. 


еа ; я "E 

Rute 9.11. Let 8? represent the variance of a random sample of size N 
from a population having variance o2, and let T? represent an unbiased est 
mate of o. Then 


(9.15) 


258 


INTRODUCTION TO SAMPLING THEORY 


Three distinct types of quantitative facts enter into this rule: (1) the popu- 
lation fact or parameter, (2) the sample fact or statistic, and (3) an unbiased 
estimate of the population fact based on information contained in the 
sample. To represent these facts we have, in keeping with common prac- 
tice, generally employed a Greek character or letter to represent the 
parameter and usually an English letter (where possible the corresponding 
one) to represent the statistie (see footnote, p. 180). Where the estimate 
differs from the sample fact (the statistic) a third symbol is needed. 
Therefore, to represent an estimate of a population parameter we shall 
use its Greek representational character superposed by a tilde (—). We 
shall continue to employ this notational scheme throughout the remainder 
of this book. 


Кок 9.11a 


» where i= X; — X (9.16) 


N-1 
This result follows directly from substituting into (9.15) the equivalent 
of 8? as stated in (6.4). 

Many writers follow the practice of defining the variance of any col- 
lection of scores by (9.16), that is, as involving division by N — 1 rather 
than by N. This practice has the advantage of simplifying some of the 
formulas that arise in sampling theory. For reasons stated in the preface 
(p. vii), we have elected not to follow this practice. In keeping with the 
above remarks on notation, the writers who do follow this practice have 
generally used the lower-case English s? to represent the sample variance 
and g? to represent the population variance. They, of course, have no 
need for the third symbol (F?) to represent the estimate of the population 
Variance, since the sample variance, as they define it, is this estimate. We 
Should have liked to use s? to represent the sample variance as we have 
chosen to define it. We realized, however, that students of this book who 
Seek to expand their knowledge of statisties by using other texts as refer- 
ences may become confused over the different meanings of the symbol 82, 
Consequently we departed from the scheme of using an English letter to 
represent the sample variance and have, throughout this book, used instead 


the German character 8?. 
Rute 9.11b. If p represents the proportion of A’s in a random sample 
Jrom a dichotomous population, then 
Мр(1— p) И 
This’ result follows from the fact that for the sample & = р(1 — p)— soe 


footnote, page 253. 


259 


INTRODUCTION TO SAMPLING THEORY 


We are now in a position to write formulas providing unbiased esti- 
mates of the variances of the sampling distributions thus far considered. 
For the sampling distributions of Rules 9.1 and 9.2 we have, upon substi- 
tuting (9.15), 


)9.18 کے 
N—I (‏ 


For the sampling distribution of the median where the population is 
normally distributed, we have (see Rule 9.4b) 


(9.19) 


=» ے‎ | Np(1— p) р(1— р) (9.20) 
da I Nol N= ü 


Now applying (9.18) we have for an approximation of the variance of the 
sampling distribution of Rule 9.7 


M a (9.21) 
ni—1 по — 1 


And applying (9.20) we have for an approximation of the variance of the 
sampling distribution of Rule 9.8 


#2 —p(- p) 4 PU — po) (9.22) 
PTE ni—l n= 1 


It should not be inferred from (9.15) and (9.17) that the square roots 
of these unbiased variance estimates are also unbiased estimates of the 
population standard deviations. That is, 


N—1 
u(s N LL 


au(x 1 s) =e? 


This follows from the fact that the mean of the square roots of a collec 
tion of values is not in general equal to the square root of their mean. For 
example, consider the scores 4, 25, and 121. Here = 50 and МХ = 7.071. 
But the square roots of these scores are 2, 5, and 11, and the mean of these 
square roots is б. 

Th spite of the fact that the square roots of (0.15) and (9.17) do not 


provide unbiased estimates of the population standard deviation hev, 
nevertheless, have been shown to provide 


in spite of the fact that 


estimates of greul нинин 
both theoretically and practically, Consequently we shall use as an esti- 


mate of a population standard deviation the square root of the unbiaset 


260 INTRODUCTION TO SAMPLING THEORY 


estimate of the population variance. For sake of completeness, we list 
below the formulas for estimating population standard deviations and also 
for estimating standard errors of sampling distributions. In each case 
these are simply the square root of the corresponding variance estimate. 


(9.23) 
(9.24 
= 8 е 
a (9.25) 
T 1.258 
Odi 9.2 
bac E r^ EE (9.26) 
s, - |202 ) (9.27) 
= 87) 825 
Og it, == Jats — 1 + пә — 1 (9.28) 
= pı — pi) , рг(1— po) 
Dg nm "ue T EX (9.29) 


By employing these estimates of population parameters together with 
those of Rules 9.9 and 9.10, it is possible to describe approximately in a par- 
ticular case the nature of the sampling distribution of a mean, median, 
Proportion, difference between two means, or differenee between two 
Proportions. The necessary information can all be gleaned from a single 
trial of the sampling procedure. As previously indicated the accuracy of 
such descriptions depends upon the size of the sample. Our previous re- 
marks regarding minimum values for N were made in anticipation of the 
use of these estimated parametric values and, hence, are still applicable. 
We shall conclude this section with two examples. 


Example 1. Consider a population of voters, a certain proportion of 
Whom favor a particular candidate for a political office. Suppose that in a 
random sample of 530 individuals selected from this population, 244 iden- 
tified themselves as being in favor of this candidate. On the basis of this 
information describe the approximate character of the sampling distribu- 
tion of the proportion of individuals favoring this candidate. 

Solution, Here р = 244/530 = .46+ and, hence, $ = AG (sce Rule 


9.9). Now applying (9.27) 
Е Vi 


INTRODUCTION TO SAMPLING THEORY 261 


0— 


Finally we know from Rule 9.5 that this sampling distribution is approxi- 
mately normal in form. This approximate distribution is pictured in Figure 
9.5. 


$246 
p-Scale 


Ficure 9.5 Approximate sampling distribution of pro- 
portions of voters favoring a particular candidate in random 
samples of 530 cases 


Comment. It is important to note that the pictured distribution is 
approximate in three respects: (1) the actual sampling distribution is only 
approximately normal in form; (2) its placement along the scale (i.e., its 
mean) may differ somewhat from that of the pictured distribution; and 
(3) its standard error may also differ from that shown. 


Example 2. Consider two hypothetical populations of fourth-grade 
school pupils. Suppose that the pupils comprising one of these populations 
have been taught a particular skill in arithmetie by Method 1 while those 
belonging to the other population have been taught this same skill by 
Method 2, and that the pupils of both populations have been given the same 
criterion test to determine the extent to which this skill has been mastered. 
Now assume that random samples of 50 and 65 cases are drawn from these 
populations respectively, that the means of the criterion scores for these 
samples аге 100 and 80, and that the standard deviations are 28 and 2+ 
On the basis of this information describe the approximate sampling distribu- 
tion of the difference between means (i.e., of the statistic X, — Хә). 

Solution. Here X, — Xs = 100 — 80 = 20, and hence (see Rule 9.11) 
ш — ua = 20. Also applying formula (9.28) we obtain 


28? 242 
TF GS 


5 ا ا 
Finally we know from Rule 9.7 that the sampling distribution involved 18‏ 


approximately normal in form. This approximate distribution is picture 
in Figure 9.6. 


262 


INTRODUCTION TO SAMPLING THEORY 


Comments. First, it should be noted that the pictured sampling dis- 
tribution is approximate in precisely the same three respects as that of the 
preceding example. Second, it should be observed that in a practical situa- 


5 10 15 


рі — 42 = 20 
X; — X»-Scale 


Figure 9.0 Approximate sampling distribution of dif- 
ference between means of two random samples 


tion in which the purpose is to evaluate experimentally the relative effec- 
tiveness of these two methods of instruction, there would exist only a single 
hypothetical population consisting not only of pupils now attending fourth 
grade, but also of such pupils as may attend similar fourth-grade classes 
in the future. Two groups of pupils selected from the currently enrolled 
group of fourth-grade pupils (these would usually have to consist of intact 
classes) would then be assumed to be random samples from this hypothetica 
Population. The methods of instruction would be assigned to these groups 
by some random procedure and the criterion scores for these groups of 
Pupils would be the only criterion scores obtained. These two groups o 
Scores would provide the information necessary for specifying the approxi- 
mate sampling distribution. Thus, there always exists the additional 
danger in an experimental design of this type that the groups cannot, as 
We assumed, be reasonably regarded as random samples f rom hypothetica. 
populations of fourth-grade pupils—populations which differ only in that 
the pupils comprising them have been taught a particular arithmetic skil 
by different methods. Of course, if the assumption of randomness cannot 
reasonably be defended, then the sampling theory which we have described 


is not applicable. 


INTRODUCTION TO SAMPLING THEORY 263 


TESTING STATISTICAL 
HYPOTHESES 


10.1 THe Notion or INDIRECT PROOF 


The student may recall from his study of plane geometry in high school 
a method of proof known аз indirect. proof or reductio ad absurdum. This 
method of proof consis simply of listing all possibilities and showing that 
all, save one, are contradictory to known fact, that is, lead to an absurdity: 
The steps in the procedure are аз follows: 


. List all possibilities, 

. Assume or hypothesize one of these possibilitic 

. Seek to determine wheth 
known fact, 

4. If such a contradiction is diseo 

5. Repeat Steps 2, 3, and 4 with 

the list. This one remaining 


"s to be true. Е 
er or not this hypothesis leads to a contradiction 


vered reject the hypothesis as false. ` sin 
1 other Possibilities until only опе remains 
possibility must then be true, 


The success of this method of proof depends 


"ow 
(1) upon a complete listing 
of all possibilities and (2) upon suce 


РШ discovery of a contradiction. ^" 
is particularly important to appreciate the fact that failure to discover : 
contradiction to an assumed or hypothesized possibility does not in any 
sense constitute proof that this possibility 
may be equally tenable in the sense 
contradiction, and for the added re: 


7 к bilities 
Is true, for other posumi i 
that they too do not appear to cine 
- К radit- 
ason that failure to discover a contra 


264 


;SES 
TESTING STATISTICAL HYPOTHES 


tion does not necessarily mean that one does not exist. The most that ean 
be said for an uncontradicted possibility is that it remains a tenable possi- 
bility since it ean not be eliminated from the list. Proof of its truth occurs 
only when it remains alone as the only uncontradicted possibility. and, 
then, ouly if the list of possibilities investigated is complete. 

Let us first consider a non-mathematical application of this form of 
proof. © is charged with the commission of a certain crime and his trial 
by jury is in progress. The attorney for his defense states that two, and 
only two, possibilities exist, namely, (1) C is guilty of the erime, or (2) С 
is not guilty. The attorney opens the defense by hypothesizing the first 
of these possibilities to be true. He points out that if, as thus hypothesized, 
C is guilty, then it follows that he must have been present at the scene of 
the crime at the time of its occurrence. After establishing the scene and 
time of the crime the attorney proceeds, through reliable witnesses, to show 
that at this particular time С was elsewhere and thus establishes a con- 
tradiction to the hypothesized possibility and, hence, the only other pos- 
sibility—C’s innocence—is proved. 

As а second example (drawn from plane geometry) suppose we wish 
to prove that in a triangle having two sides of unequal length, the angle 
Opposite the longer of these two sides is larger than the angle opposite the 
shorter. In terms of Figure 10.1, suppose that it is a known fact that BC 


Pravm 10.1 Triangle with side BC 
longer than side AB 


A С 


is longer than AB (i.e, BC» AB). Our problem, then, is to prove that 
Angle A (which is opposite side BC) is larger than Angle C (which is op- 
Posite side AB). Or stated symbolically, we wish to prove that А > G 
Now let us assume that the following facts are known or have been 
previously proved and are, therefore, at our disposal: (D the fact that if 
two angles of a triangle are equal in size then the sides Opposite them are 
equal in length; and (2) the fact that if two angles of a triangle are unequal 
then the sides opposite them are unequal, the longer being that which lies 


Opposite the larger angle. nbs. 
We begin by a complete listing of possibilities as follows: 


Possibility 1: А = C. 
Possibility 2: А < C. 
Possibility 3: A> C. 


TESTING STATISTICAL HYPOTHESES 265 


Next we hypothesize Possibility 1 to be true. But if this possibility is 
true then by the previously established fact (1) above, it follows that side 
AB equals side BC. But this is contradictory to the known fact that side 
BC is longer than side AB, and hence Possibility 1 is eliminated from the 
list. We continue by hypothesizing Possibility 2 to be true. But, if this 
possibility is true it follows from previously established fact (2) that side 
AB must be longer than side BC, which again contradicts the known fact 
that BC is longer than AB. ‘Thus Possibility 2 is eliminated and Possibility 
3, the only remaining possibility, is proved true. 


10.2 TESTING STATISTICAL H YPOTHES INTRODUCTORY REMARKS 


The testing of a statistical hypothesis is a process for drawing some 
inference about the value of a population parameter from the information 
contained in a sample selected from the population. 

The logie involved is, in many respeets, similar to that of indirect 
proof. In one major aspect, however, it differs markedly. In indirect 
proof, a possibility is eliminated only when it is found to lead to a definite 
contradiction of known fact. In testing a statistical hypothesis, on the 
other hand, the hypothesis (i.e, the possibility under consideration) 18 
rejected (i.c., eliminated) if a specifie occurrence of an event ean be shown 
to be highly unlikely if the hypothesis is assumed true. In other words, if 
this event is inconsistent with the hypothesis because the probability ol 
its oceurring is low, then the hypothesis is rejected as a possibility. The 
event referred to is always the value obtained for some statistic in the case 
of a particular sample, while the hypothesis is a particular value of some 
parameter selected from among all values which are possible. If, upon 
referral to the sampling distribution which would apply, assuming the 
hypothesis to be true, the particular obtained value of the statistic is found 
to be an unusual or improbable one, then its occurrence is regarded as 
sufficiently inconsistent with the hypothesis to justify rejection of the 
hypothesis as a possible value of the parameter, It will be recognized that 
this technique docs not afford rigorous and incontrovertible proof in the 
sense of indirect proof, since Possibilities are eliminated because of the 
occurrence of events which are only unlikely rather than impossible under 
the conditions hypothesized, 


Difficult as they may be to appreciate when presented void of illustri- 
tion, we shall next outline the steps involved in testing statistical hypothe- 
ses. Illustrative examples, definitions of certain terminology, and further 
discussions of the logical aspects of the process will be presented in following 
sections. 

SrkP 1. State the statistical hypothesis to be assumed truc. 
Comment. This calls for selecting a v 


a- 
| alue from among those a popul 
lion. parameter could conceivably take, 


and assuming it to be the true 


266 


SES 
TESTING STATISTICAL HYPOTHESE 


value. This corresponds to the step in indirect proof of selecting from 
among the possibilities one to be hypothesized as true. 


Srer о. Specify the level of significance to be used. 


In general terms, level of significance refers to the degree 
of tmprobability which is deemed necessary to cast sufficient doubt upon the 
truth of the hypothesis to warrant its rejection. 


Comment. The level of significance is stated in terms of some small 
Probability value such as 10 (i.e., one in ten), or .05 (i.c, one in twenty), 
or .0I (i.e, one in a hundred), or even .001 (Le, one in a thousand), The 
choice of a particular probability value is a purely arbitrary one and need 
hot be limited to the particular values just cited. Considerations influenc- 
ing the choice will be treated in a later section. It is customary to represent 
this probability value by the Greek letter alpha (œ). There is no cor- 
responding step in the process of indirect proof for the obvious reason that 
absolute contradiction rather than improbability is the criterion for rejec- 
tion. It should be appreciated that in selecting a level of significance we 
are simply indicating what we mean by the phrase "sufficiently improbable" 
when we state that under the terms of the hypothesis considered, the ob- 
Served value of the statistic is “sufficiently improbable” of occurrence to 
discredit (i.e., to cause us to reject) this hypothesis. 


Srep 3. Specify the critical region to be used. 

DEFINITION. A critical region is a portion of the scale of possible values 
of the statistic so chosen that if the particular obtained value of the statistic 
Falls within it, rejection of the hypothesis is indicated. 

Comment. There are two criteria for choosing the critical region. 
First, it must be made consistent with the level of significance adopted. 
This implies that it be so located that 27 the hypothesis is true, the probability 
Of the statistic falling in it equals (at least does not execed) this level of 
Signifieance. Second, it should be so located that if the hypothesis is not 
true, the probability of the statistic falling within it isa maximum. That 
is to say, the ideal critical region is one, such that if the hypothesis is false, 
the chances of rejecting this false hypothesis become as large as possible 
Within the limits of the framework of the particular investigation. The 
task of locating critical regions so as best to comply with these criteria will 
be discussed later. 

Srep 4, Carry out the sampling study as planned and compute the value of 
the test statistic. 

Comment. The phrase test statistic is simply used here to refer to the 
Statistic employed in effecting the test of the hypothesis. It is important 


*A more precise definition is presented in Section 10.11. 


TESTING STATISTICAL HYPOTHESES 2 67 


to note that decisions regarding the three preceding steps can—in fact, 
should—be made before the sample is selected and data gathered. 


Srep 5. Refer the value of the test statistic as obtained in Step 4 to the critical 
region adopted. If the value falls in this region, reject the hypothesis. 
Otherwise, retain or accept the hypothesis as a tenable (not disproved) 
possibility. 


An illustration of the application of this technique is presented in the 
following section. 


10.3 THe PROBLEM OF THE PRINCIPAL AND THE SUPERINTENDENT 


One day the principal of an elementary school in a city school system 
approached the superintendent contending that the population of children 
which fed into his building were, "оп the whole," subnormal in intelligence 
and, as a consequence, almost impossible to bring up to the educational 
evel achieved by the pupils of other clementary schools in the city- He 
directed attention to the fact that this population lived, for the most parts 
in slum dwellings in an “across-the-tracks” district in an environment which 
ле contended to be completely devoid of incentive for achieving educat ional 
success and entirely lacking in opportunity for enriching extra-xchool 
experience, As further evidence in support of his contentions, he pointed 
to the low standing of his school as measured by city-wide testing programs» 
о the disproportionate number of pupils from his school who failed 1 
junior high school, and to the high incidence of delinqueney among these 
pupils. Furthermore, he vigorously rejected as a po ble alternative 
‘planation any lack of efficiency on the part of his staff or in the operation 
of his school’s program. As a solution to the problem, he urged that specin 
funds be appropriated to enable him to construct special rooms, 10 engage 
special teachers in addition to his regular staff, and to purchase specia 
equipment, aids, and materials adapted to the peculiar needs of slow learn- 
ers. He argued that only through such measures could his school hope ® 
raise its pupils to the educational level achieved by the pupils of other 
elementary schools in the system. 

The superintendent gave s 
doubt regarding the principal 


"mpathetie audience but reserved personal 
s notions of the character of the school’s 
population. He asked the principal for time to consider and decided (0 
undertake a statistical investigation of the intelligence characteristic? © 
this population. This implied selecting а sample from the population: 
measuring the intelligence of its units (children) and inferring from {һе 
results whether or not the principal’ is 5 oll 


ч s characterization of the popukt! 
was accurate. He decided that he would use the IQ seore yielded by the 
. intelli- 
ШУ 


Wechsler Intelligence Scale for Children (WIsC 


m "Igne as a measure of 
gence. The WISC is a generally a 


vecepted measure of intellectual ab 


268 


which must be administered individually to cach child by a specially trained 
expert. The superintendent estimated that all considered, it would be im- 
possible to administer this test to more than four pupils per school day. At 
this rate it would require the full time of one school psychologist for 16 
school days (more than three work-weeks) to obtain IQ's for 64 children. 
He felt hard pressed to justify even this great an investment of time on the 
part of the school psychologist. He decided, nevertheless, to ask the psy- 
chologist to obtain WISC IQ's for a random sample of 65 children selected 
from among those currently enrolled in the school in question. He felt that 
it was reasonable to assume that the children currently enrolled constituted 
a random sample from the hypothetical population of children who would 
attend the school during the expected life of the special facilities recom- 
mended by the principal, and that, hence, a random subdivision or subset 
of the pupils currently enrolled could reasonably be regarded as a random 
sample from the population to which he wished to extend or generalize his 
observations. In due time the 65 1Q scores arrived on his desk. The uses 
he made of them in attempting to arrive at a decision about the principal’s 


recommendation are described in following sections. 


10.4 Tue PROBLEM OF THE PRINCIPAL AND THE SUPERINTENDENT: 
SoLUTION I 


Strep 1. The statement of the statistical hypothesis. 


The superintendent recognizes that "on the whole" the population of 
children concerned can be either below normal in intelligence, normal in 
intelligence, or above normal in intelligence. He reasons that insofar as his 
Particular problem is concerned, there is no difference between the latter 
two possibilities. Certainly he would not wish to approve the principal's 
recommendation if either of these were truc. Hence, he decides to reduce 
the problem to the consideration of just two possibilities, namely, (1) the 
Population of children is, “on the whole,” normal in intelligence, and (2) 
the population of children is, "on the whole,” below normal in intelligence. 

He next considers the question of the meaning of the phrase "оп the 
Whole.” He quickly discards, as invalid for the purpose of this problem, 
the notion that “on the whole" means all or even a large majority of the 
children comprising the population. After some consideration he decides, 
quite arbitrarily, to define “on the whole” to apply to the mean IQ for the 
Population. Sinee an IQ of 100 implies normal intellectual ability, the 
two possibilities can now be translated into the statements: (1) the mean 
IQ score for the population is 100, and (2) the mean IQ score for the popu- 
lation is less than 100. Stated symbolically these possibilities are: 


(D w= 100 
(2) u < 100 


TESTING STATISTICAL HYPOTHESES 26 9 


The superintendent chooses to test statistically the first named pos- 
sibility. That is, he hypothesizes that w= 100. The alternative is that 
ш < 100. 


тер 2. The selection of the level of significance. 


The considerations entering into the choice of a level of significance can 
best be presented later. Hence, at this point we shall simply state that the 
superintendent is concerned lext he approve the principal's proposal only 
to discover later that the population is not below normal in intelligence. 
In other words, he is afraid that he may err by rejecting a hypothesis that 
is actually true. As a reasonable safeguard against this possibility, he 
decides to choose a rather small probability value as his definition of the 
degree of improbability sufficient to discredit the hypothesis. The value 
he selects is .01 (one in a hundred). That is, he lets œ = .01. 


Srer З. The specification of the critical region. 


To specify a critical region it is first necessary to at least approximate 
the sampling distribution that the statistic would obey if the hypothesis 
under test were actually true. Because the statistic involved is the mean 
of a "large" random sample, Rule 9.2 applies. That is, the sampling dis- 
tribution is approximately normal in form with a mean of 100 (the hypothe- 
sized value of the mean IQ of the population from which the sample is pre- 
sumed to have been randomly selected) and a standard error of o/ V 6? 
where ø is the standard deviation of the population of IQ scores (see Rule 
9.20). Now, of course, the superintendent does not know the value of % 
nor is he interested in its value except for the purpose of determining the 
standard error (cg) of the sampling distribution. Consequently, he 15 
compelled to use an estimate of с based on the sample. Formula (9.23) 
indicates the appropriate estimate which in turn could be divided by н 
to provide the required extimated value of the standard error. Since the 
only use the superintendent has for an estimate of ø is to obtain an estimate 
of the required standard error—that is, since he has no interest in ап 
estimate of ø for its own sake—it is possible for him to take advantage of 
the computational short eut provided by formula (9.25). This requires 
that he first determine the sample standard deviation, &. Working with 
the 65 IQ scores and applying formulas (6.6) and (6.5) he finds the value 
of $ to be 20. Then applying (9.25) he obtains 


сҮ = 


He then sketches the approximate sampling distribution shown i? 
Figure 10.2. Now since the only admissible possibilities with respect to the 
value of u are that either u = 100 or u < 100, the only explanation for 2? 
obtained value of X > 100 is the operation of chance in determining the 


270 


. ES 
TESTING STATISTICAL, HYPOTHESP 


composition of the sample. On the other hand, two possible explanations 
exist for any obtained value of X < 100, namely, (1) the operation of 
chance and (2) the possibility that д ix less than instead of equal to 100. 


FiGvur 10.2 Approri- 
mate sampling distribu- 
lion of X for random 
samples of 63 cases se- 
lected from a population 
having u = 100 


oy Upper limit of едеу 


critical region | 2020 


T i T | 1 
92.5 l 95 97.5 100 102.5 105 107.5 
94.2 X-Scale 


The smaller the obtained value of X, the more plausible the second of these 
two explanations becomes. Hence, in this situation the logical location 
for a critical region is somewhere down the X-scale from the 100 point. 
Just how far down the upper limit of the region should be located is gov- 
erned by the level of significance. Here the superintendent has adopted an 
€ of .01. In the unit normal distribution (8.3), .01 of the area lies below a 
Point 2.33 standard deviations below the mean—i.e., below z= — 2.33 
(see Table II; Appendix C). To translate this z-value into terms of the 
X-scale, the superintendent applies formula (8.5) as follows: 
Tr = (2.5)(— 2.33) + 100 
=— 5.83 + 100 


= 94.17 = 94.2 


The portion of the sampling distribution over the critical region thus 
established is the blackened portion of Figure 10.2. 


Step 4 The determination of the value of the statistic. 


To determine the value of the statistic, the superintendent has only 
to compute the mean of the 65 IQ scores comprising the sample at hand. 
Te finds the value of X to be 94. 


STEPS. The decision. 

The superintendent now refers the obtained value of X = 94 to the 
Critical region he has established and notes that it falls in this region, 
Hence, he rejects the hypothesis that и = 100. ‘This decision implies that 
H < 100, since this is the only other remaining possibility (alternative), 


“Го represent the boundary point of a critical region we shall use the symbol represent- 


ing the statistic involved with R written as a subscript. 


TESTING STATISTICAL HYPOTHESES 2 Z 1 


The action implied by the outcome of this particular solution to the problem 
is the approval of the funds requested by the principal. 


10.5 Tue PROBLEM OF THE PRINCIPAL AND THE SUPERIN 
А MODIFICATION OF SOLUTION I 


NDENT: 


We shall consider here a slight modification in the mechanics of the 
solution just described. The solution as we shall modify it is the equivalent 
of that employed by the superintendent. However, the modified solution 
will have the advantage of being somewhat more similar in nature to other 
tests of statistical hypotheses which the student may later encounter 1 
this or more advanced books on statisties. For this reason, this modified 
approach will be followed in most of the examples of testing statistical 
hypotheses which follow. : 

The first procedural change occurs in Step 3 in which the eritieal regio? 
is established. Since the normal distribution provides an approximate 
model of the sampling distribution of the statistic involved (X), and since 
any normally distributed variable can be transformed into the normally 
distributed z of (8.3), we shall use this z instead of Ñ as the test statist: 
This requires that the critical region be specified in terms of the cale 
instead of the X-seale. In terms of the z-cale, the critical region chosen 
by the superintendent extends downward from — 2.33. This тау be 
written symbolically as follows: 


The second procedural change occurs in Step 4, in which the value of 
the test statistic for the sample at hand is determined, Since the text 
statistic is now z rather than X, we must use the sample data to determine 
the value for the sample. This is done by application of formula (8.4): 
In the superintendent's problem, the value of z is obtained as follows: 


2 


X—y 194—100 —6 


Ox 2.5 2.5 


Cow mn ‘ T i я ‚иса! 

Now to reach a decision (Step 5), we refer this value of z to the € riti 
region A. Since — 2.4 is less than — 2:33$ the obtained value of z falls ™ 
—an outcome which dictates rejection of the hypothesis as before. 


10.6. Tue PROBLEM OF THE PRINCIPAL AND THE SUPERINTENDENT! 
Souvtion II 


1 Р H =! ^ ie 
И Let us suppose that in Step 1 the superintendent had chosen to jura 
on the whole" as the median (È) IQ for the population. Stated symbolic? 
the two possibilities now become 


*Read "Critical region (R) is z equ: Ў 
R "g mal to or less than — 2.33.7 
7The larger the absolute value of a ne, aF “i 


А í ie val 
gative number the smaller is its algebraic va 


пе. 


27 2 rns? 


NG STATISTICAL HYI 


(1) £= 100 
(2) £ « 100 
The solution to the problem with "on the whole" thus defined is out- 


lined below. 


SrEP 1. H*: €= 100; alternative: £< 100 


STEP 2. a= .01, as before 
STEP 3. R: zs — 2.33 


Comment. Here we find the superintendent using the modification 
suggested in Section 10.5. He is justified in using the normally distributed 
z as a test statistic since the statistic involved— that is, the sample median 
(mdn)—is known to be approximately normally distributed (see Rule 9.3), 
With mean & (ie, with a mean equal to the median of the population 


sampled). 


Srep 4 The z for the sample at hand is given by 


_mdn— & 


O min 


Before we ean apply this formula it is necessary to obtain an estimate 
of the standard error of the sampling distribution of medians (Eman). For 
this purpose the superintendent elected to use formula (9.26) as follows: 


x 20 


Onin = = 3,128 = 3.13 
This formula is appropriate only if the population of IQ scores sampled is 
itself normally distributed. This assumption is not unreasonable in this 
situation, however, since for ordinary populations of children, IQ scores 
are known to be approximately normally distributed. In addition to бшу, 
the superintendent also needs to determine the value of the median (ndi) 
for the sample at hand. Let us suppose that this median had the value 93, 
a value slightly smaller than that of the sample mean which was 94. Then 
өз — 100 _ 


= Pha IN == 224 
3.13 


Step 5. Decision: Retain the hypothesis. 

Since — 2.24 is larger than — 2.33, the obtained value of z does not fall 
in the Ras specified—an outcome which dictates retention of the hypothesis 
that £= 100 in the list of possible values of ё. It is important that the 
Student appreciate the fact that this outcome does not constitute proof 
that €= 100, It means only that the evidence is not sufficiently incon- 


“Le, the hypothesis to be tested. 


TESTING STATISTICAL HYPOTHESES 27 3 


sistent with the possibility that €= 100 to warrant climinating this pos- 
sibility from the list. In fact, no value belonging to the family of values 
lumped into the other possibility (i.e., the possibility that & < 100) could 
be eliminated on the basis of the evidence at hand so that both possibilities 
remain in the list.* 

Of course, in view of this outcome the superintendent's appropriate 
course of action is denial of the principal's request. It is important to note 
that the decision dictated by this second solution to the problem differs 
from that dictated by the outcome of the first solution in spite of the fact 
that the sample median (93) differed from the hypothesized value of the 
population median (100) by a greater amount than the sample mean (94) 
differed from the hypothesized value of the population mean (100). It is 
clear, then, that the outcome of a test of a statistical hypothesis may vary 
with certain arbitrary decisions made in the course of setting up the test 
These arbitrary decisions almost always represent subjective judgments 
on the part of the person conducting the test. The considerations basic to 
such judgments will be treated in later sections. 


10.7 THE PROBLEM or THE PRINCIPAL AND THE SUPERINTENDENT: 
SOLUTION III 


In this solution, we shall assume that all of the judgmental decisions 
made by the superintendent are the same as in Solution I except for the 
size sample employed. We shall here suppose that in an effort to be as 
economical as possible of the school psychologist's time, the superintendent 
elects to base his decision on a sample of 50 instead of 65. Suppose further 
that for this sample of 50, the mean and standard deviation turn out to 
have the same values as before, namely, 94 and 20 respectively.t 


Srev 1. I: w= 100; alternative: u < 100 
STEP 2. 


STEP 3. 


STEP 4. [sce (9.25)] 


94 — 100 
=" 389 ^-210 [see (8.1)] 


*For any hypothesized value of E < 109, the value of z for the sample at hand would he 
greater than the value — 2.24 obtained for the hypothesis £ = 100. Since any 2 > — 2.83 
ind etention, no hypothetical value of E < 100 could be rejected. | 

TOrd y one would expect some si mple-to-sample variation to occur in these values. 
We have elected to assume the same values in order to simplify comparisons which we 
wish to make later. i 


274 


SES 
TESTING STATISTICAL HYPOTHES! 


Srep 5. Decision: Retain hypothesis. (Why?) 


Once again the course of action dictated differs from that of Solution 
I~ in spite of the fact that the sample mean and standard deviation have 
the same values as before. The difference in outcome arises from the fact 
that the smaller the sample the larger we would expect the chance sample- 
to-sample variations in the values of the sample means to become. It 
follows that a discrepancy between statistic and hypothesized value of 
ies the definition of “sufficiently improbable to dis- 
in the case of a large sample, may not satisfy this 


parameter which sa 
credit the hypothe 
definition in the case of a smaller sample. 


10.8 Tre PROBLEM OF THE PRINCIPAL AND THE SUPERINTENDENT: 
Ѕошстох IV 


бтр 1. The statement of the statistical hypothesis. 


In this solution we shall have the superintendent adopt quite a different 
line of attack. In considering the problem we shall have him reason that 
any child with an IQ of 90 or above should experience no particular diffi- 
culty in keeping reasonably well apace with the normal program of his 
school grade, while pupils with IQ scores below this level—at least those 
five or more points below —may indeed experience considerable difficulty 
in maintaining normal progress. In keeping with this line of reasoning we 
shall have the superintendent approach the problem by inquiring into the 
proportion of children in the population having IQ's below 90. А propor- 
tion in excess of that usually found will constitute evidence in support of 
the principal’s contention, whereas, an equal or smaller proportion will 
imply refutation. 

Now the superintendent is aware that in a normal (in the sense of 

usual) population, WISC IQ scores are approximately normally distributed 
With mean 100 and standard deviation 15. Hence, in the usual population 
an IQ score of 90 corresponds to a normally distributed z of — .67, a z-value 
which has a percentile rank of 25.14 (see Table П, Appendix C). Viewing 
the population as dichotomous—that is, as consisting of children with TQ 
scores below 90 and with IQ scores of 90 or higher—the superintendent 
decides therefore, that he will approve the principal’s recommendation 
only if the population proportion of children with IQ scores below 90 is 
greater than one-fourth (.25). This amounts to considering only the 
following possible values of the population proportion (ф) of children with 
IQ scores below 90:* 
Pn em 25. However, the superintendent would be even less 
Justified in approving the principal's recommendation in this event than he would if 
Ф = 25. Hence, for the purpose of making the decision called for by the problem at 
hand, the possibility that $ < .25 is the same as the possibility that @ = .25. 


"It is possible, of course, that Ф <. 


TESTING STATISTICAL HYPOTHESES 2 7 5 


The superintendent, therefore, elects to test as a statistical hypothesis 
the possibility that ¢ = .25. The alternative is that @ > .25. 


Srep 2. Selection of the level of significance. 


Here we shall simply have the superintendent make the same choice as 
in the previous solutions. That is, we shall have him let a = .01. 


Srer 3. The specification of the critical region. 


Now the superintendent knows that as the sample size becomes large, 
the sampling distribution of a proportion (p) tends toward a normal dis- 
tribution with mean @ (see Rule 9.5) and standard error 


eo VERE [sce (9.9)] 


We shall have him instruct the school psychologist to obtain IQ scores for 
100 randomly selected pupils.* Then, if the hypothesis is true, that is, if 
ф = .25, it follows that 


= 


The sampling distribution of p is, therefore, approximately as shown in 
Figure 10.3. Now since the only admissible possibilities with respect to the 
value of @ are ф = .25 and ¢> .25, the only explanation for an obtained 
value of p <.25 is the operation of chance in determining the composition 
of the sample. On the other hand, an obtained value of p> .25 may be 


= V.001875 = .0433ї 


*He realizes that 100 is scarcely enough to justify use of the normal distribution model 
especially if is as small as one-fourth but does not feel warranted in investing more of 
the psychologist’s time than would be required to obtain more than 100 1Q scores. 

TBeginning students not infrequently fall into the error of using the obtained (sample) 
value of p in computing this standard error rather than the hypothesized value of Ф. 
Recall that the sampling distribution used in locating the critical region must be the 
distribution that would a were the hypothesis under test actually true. Since the 
standard error of a proportion is a function of the population proportion (ф), the speci- 
fication of the sampling distribution of p that would arise were the hypothesis true ee 
quires the use of the hypothesized value of @ in determining its standard error. The 
standard error thus determined (note that the symbol g, and not čp was used) is not an 
estimate but is rather the exact value that would apply if the hypothes trie. Tt if 
true that the sample standard deviation (8) was used in estimating the standard error of 
the sampling distributions involved in the preceding solutions of this problem. In none 
of the solutions, however, were the standard errors functions of the parameter In 
question (i.e, of u or Ẹ). Nor was the value of the population standard deviation which 
is necessary to the determination of the standard errors of the sampling distributions 
needed involved in any of the hypotheses tested. Therefore, the use of the sample $ in 
IM error was not inconsistent with, nor did it in any way violate, these 


276 


TESTING STATISTICAL HYPOTHESES 


due cither to the operation of chance or to the fact that Ф is actually greater 
than .25. The larger the value obtained for p, the more plausible the latter 
of these explanations becomes. Hence, the logical location for the critical 
region is somewhere up the p-scale from the .25 point. Since the level of 


1—7 — b 
1201 1634 .2067  .25  .2933 .3366| .3799 
p-Scale .3509 


Гісоке 10.3 Approximate model of the sampling distribu- 
tion of a proportion (p) when ф = .25 and N = 100 


Significance is to be 1 per cent (1.е., œ = .01), the lower bound of the critical 
region must correspond to the point z = + 2.33 in the unit normal distribu- 
tion. In terms of the p-scale, this point is 


p, = (0433) (+ 2.33) + -25 = -3509 [see (8.5)] 
and, hence, 
R: p= 3509 


The portion of the model sampling distribution over the critical region thus 
established is the blackened portion of Figure 10.3. 

Or, if we have the superintendent follow the modified procedure 
described in Section 10.5, that is, if we have him use z as a test statistic, the 
region may simply be specified in terms of the z-scale as follows: 


R: 22+ 2.88 
Step 4. The determination of the value of the statistic. 


To determine the value of the statistic, the superintendent has only to 
count the number of IQ scores in the given sample which are below 90 and 
to express this number as a proportion of the total number of cases in the 
Sample (i.e., 100). Suppose that 36 such scores were found. Then p = .36, 

Or, if we have the superintendent use the modified procedure the test 
Statistic is the z-value for the sample. This is computed by formula (8.4) 
as follows: 


TESTING STATISTICAL HYPOTHESES 2. 7 F 


Srep 5. The decision. 


The superintendent now refers the obtained value p= .36 to the critical 
region (R: p= .3509) and, noting that this value falls in PR, he rejects the 
hypothesis that ф = .25. This decision implies that $ > .25 
the only other possibility. The action dictated by this outcome is approval 
of the principal’s recommendation. 

Or, if the modified procedure is followed the sample value of the test 
statistic z = + 2.54 is referred to the critical region, R: z z + 2.5 and 
the same decision is again reached. 


since this is 


pa 


10.9 Tur PROBLEM OF THE PRINCIPAL AND THE SUPERIN' 
SOLUTION V 


INDENT: 


In this solution we shall again have the superintendent view the popu- 
lation as a dichotomous one consisting of children who are below and not 
below normal in intelligence. However, we shall here have him define 
below normal intelligence as IQ « 100. If the population concerned is like 
the usual one the proportion of its members having IQ scores below 100 is 
one-half. With this definition of below normal, the superintendent's in- 
terest is in the possibilities ф = .5 and ф > .5. The solution to the problem 
now proceeds as follows: 

Srer 1. H: @=.5; alternative: p> .5 
ТЕР 2. a = .01, as before 
Ster 3. R: z= +2.33 
Sree +. Determine the value of the statistic. 
The z for the sample at hand is again given by 
pe DP 
Tp 


if we assume that a sample of 100 is again used, the value of e, for $ = -5 is 


gym (OW. qe [see (9.9)] 
Now suppose that 61 of the 100 IQ scores comprising the sample were below 
100. Then the sample value of p is .61 and 
:61— 5 

05 


Strep 5. Decision: Retain the hypothesis. (Why?) 


ge 


| Note that the decision dictated by this solution is the opposite of that 
dictated by Solution IV in spite of the fact that in each case the difference 


278 


TESTING STATISTICAL HYPOTHESES 


between the obtained value of the statistic (p) and the hypothesized value 
of the parameter (ф) is the same. (In Solution IV, p ф = .36 — .25 = 11; 
and in Solution V, p— ф —.61—.50—.11.) This is due to the fact that. 
sample-to-sample chance variation in the value of p becomes greater as the 
value of ¢ approaches .5 - see formula (9.9). On the other hand, it should 
be observed that the normal distribution provides a more accurate model 
of the sampling distribution of p for samples as small as 100 when ф = .5 
than when ф = .25 (sce p. 253). 


10.10 THe PROBLEM OF THE PRINCIPAL AND THE SUPERINTENDENT: 
Souution VI 


In this, the last solution to this problem which we shall consider, we 
shall have the superintendent follow the line of the preceding solution (V) 
With one exception. We shall here have him take the position that while on 
the one hand the principal’s contention may be true, on the other the very 
opposite may be true. That is, it may be that the population of children 
involyed is actually above normal in intelligence, and that the true explana- 
tion of the school’s low standing as measured by city-wide testing programs 
and the disproportionate number of junior high school failures lies in the 
direction of inefficiency and maladministration. We shall have the super- 
intendent wonder if it may not be that the high incidence of delinquency 
among the pupils involved is symptomatic of failure to challenge them up 
to the true level of their abilities, of failure to keep them properly motivated 
and occupied, and of failure to maintain adequate discipline. We shall 
have him reason that if these things are true then the principal and perhaps 
at least certain members of his staff should be subject to dismissal for in- 
competent performance of their duties. | 

The effect of such an attitude on the part of the superintendent is to 
introduce, along with a third possibility, a third course of action. In general 
terms the three possibilities and their attendant courses of action may now 


be summarized as follows: 


Possibility 1. The population is normal (in the sense of usual) in intelligence. 


Action 1. Deny the principal's request. Undertake to help him trouble-shoot 


along other lines. 
Possibility 2. The population is 
Action 2. Grant the principal’s request. 


Possibility 3. The population is above normal in intelligence. 
Action 3. Dismiss the principal and certain members of his staff. 


below normal in intelligence. 


We shall now have the superintendent translate these possibilities inte 
terms amenable to statistical test as follows: Let ф represent the proportion 
of children in the population whose IQ scores are below 100. Then the 


three possibilities become respectively: 


TESTING STATISTICAL HYPOTHESES 279 


(1) ф=. 
(2262. 
B)o<. 


{л TU D 


As in the preceding solution we shall have the superintendent begin by 
hypothesizing the first of these possibilities. Now the only change which 
the superintendent need make in the preceding solution is in the specifica- 
tion of the critical region (Step 3). As before two possible explanations 
exist for a value of p > .5, namely, (1) the operation of chance in determin- 
ing the composition of the sample at hand, and (2) the possibility that 
o> .5. Now, however, there are also two possible explanations for a value 
of p < .5, namely, (1) the operation of chance as before, and (2) the possi- 
bility that ¢ < .5. In this situation, therefore, the greater the amount by 
which p exceeds .5, the more plausible becomes the possibility that $ > 9, 
while the greater the amount by which p falls below .5, the more plausible 
becomes the possibility that ¢ < .5. Clearly, then, if the critical region 15 
to function with respect to both possibilities part of it must be located 
toward the upper end of the p-scale and part toward the lower end. We 
shall have the superintendent split the region equally between the two ends. 
That is, we shall have him place the lower bound of the upper part of the 
region at z = + 2.58 since in the unit normal distribution the probability of 
z= + 2.58 is .005. Similarly we shall have him place the upper bound of the 
lower part of the region at 2 = — 2.58. Now, if the hypothesis is true, the 
probability of p falling in either part of the region is .005 + .005 = .01 
which is the selected value of œ. Symbolically this critical region may be 
written as follows: 


R: z = — 2.58 and z = + 2.58; or | z | = 2.58 


Now, using the same data as in the preceding solution (i.e., using 
p = .61) we obtain for the value of the test statistic z = 4 2.20 as before. 
Since this z does not fall in cither part of the critical region as specified the 
hypothesis (ф = .5) must be retained as a tenable possibility. Though the 
superintendent is aware that this outcome does not prove that ф = 5, 
nevertheless, the best course of action for him to follow, with the informa- 
tion at hand, is that identified above as Action 1. 

To round out the discussion, let us suppose that instead of 61 there 
were 65 IQ scores in the sample which were below 100 in magnitude. Now 
the sample value of z becomes 


This value of z falls in the upper part of R, dictating rejection of the 
hypothesis ф = .5. This leaves two possibilities in the list, namely, ф > 5 
and ф < .5. However, for any hypothesized value of ¢ < .5, the value of 


2 8 О TESTING STATISTICAL HYPOTHESES 


the test statistic z would only be still greater than + 3.00* so that rejection 
of ф = .5 when p falls into the upper part of R also automatically implies 
rejection of @ « .5, leaving ф > .5 as the only remaining possibility. 

Similarly, rejection of the hypothesis ¢ = .5 as a result of a value of p 
falling into the lower part of № also automatically implies rejection of 
¢ > .5, leaving ¢ <.5 as the only remaining possibility. 


10.11 CHOOSING THE LEVEL ОЕ SIGNIFICANCE: 
Tur Two Tyres or Error 


The choice of a level of significance (a), that is, the selection of some 
small probability value as the definition of what is meant by “sufficiently 
improbable of occurrence to discredit the hypothesis," is actually a non- 
statistical problem in the sense that it calls for a purely arbitrary subjective 
judgment. The levels most commonly judged suitable are .01 and .05. 
Occasionally .001, .02, .10 and even .20 are selected. The type of considera- 
tions which enter into the formulation of this judgment can best be appre- 
ciated in the perspective of an analysis of the kinds of errors which may 
arise in connection with tests of statistical hypotheses. 

Obviously, one of two possibilities applies to any statistical hypothesis 
(H): either (D) it is true; or (2) it is false. If it is true, there are still two 
courses of action to which our test may lead: either (1) we retain this true 
II —the desired correct action; or (2) we reject it—the undesired erroneous 
action. Similarly, if // is false, there are also two courses of action to which 
either (1) we reject this false 7—the desired correct 
ain it—the undesired erroneous action. These two 
erent in character. Since one 


our test may lead: 
action; or (2) we ret Н 
undesired erroneous actions are clearly difi , 
can occur only if the И under test 15 false, and the other only if it is true, 
they are mutually exclusive in any given situation. That is to say, both 
cannot occur at the same time. These two kinds of errors are identified 
respectively as errors of the first and second kind or type. 

Derinition. 4 Type I error, or an error of the first kind, consists in 
rejecting a hypothesis that is actually true. 

Derinirion. A Type L error, or an error of the second kind, consists in 
retaining a hypothesis that is actually false. 

Now if the H under test is in fact true, the probability of the value of 
the test statistic (S) falling in the critical region (R) is equal to a, that is, 
to the level of significance chosen (e.g., see Figures 10.2 ene 108). Puit 
S falls in R rejection of this true H is indicated. That 15, H being true, the 
occurrence of an S in R implies the occurrence of a Type I error, and, hence, 
Sete ы лу з ш 


*For example, if @ = .49, z = (.65 — 49) /.05 = + 3.20. 


281 


TESTING STATISTICAL HYPOTHESES 


a represents the relative frequency with which Type I errors would occur 
with long-run repetition of the particular statistical test. We are now ina 
position to present а more precise definition of level of significance. 


DEFINITION. dn situations in which Type I errors аге possible, the level 
of significance (о) is the probability of such an error. 


In considering this definition the student should recognize: (1) that a 
Type I error can only occur if H is true; and (2) that if M is true, 5 would 
nevertheless fall in № 100a per cent of the time, were we to conduct many 
independent repetitions of this particular statistical test. Thus, through 
the selection of œ, we have at our disposal a means of controlling the likeli- 
hood of a Type I error. 

At this point the student may wonder why an а as large as .05 would be 
common, or why an @ of .10 or .20 would ever be used, when the choice of 
smaller probability values for œ would have the effect of markedly reducing 
the likelihood of occurrence of a Type I error. It is, in fact, possible to 
eliminate the occurrence of Type I errors entirely. To accomplish this, all 
we have to do is to let a = 0. This, of course, implies that no critical region 
exists. In other words it amounts to deciding, regardless of the strength of 
the evidence to the contrary, always to retain any H tested. In fact, it 
would be quite unnecessary under such a rule of operation ever to bother to 
analyze, or, for that matter, even collect any data at all. All that would be 
necessary would be to state M and then retain it. Obviously, while such a 
procedure would completely eliminate the possibility of making a Type I 
error, it does not provide a guarantee against error, for every time that the 
H stated was fa se, a Type II error would necessarily occur. Similarly, bY 
letting o — 1 it would be possible to eliminate entirely the occurrence of 
Type II errors at the cost of committing a Type I error for every true // 
tested, й 

It is clear, 


dH тот the foregoing remarks, that the choice of a level of 
significance must represent a compromise effort at controlling the two 
types of error w lich may occur in testing statistical hypotheses, Just what 
compromise is most appropriate in a given situation depends upon a coni 
parative evaluation of the seriousness of the consequences of these two 
types of error. | 


For purposes of illustration, consider the problem of the principal and 
the superintendent. If we suppose that the implementation of the prin- 
cipal's recommendations would involve a very ОНЫЙ outlay of cash 
from funds for which many important demands exi и am might list at least 
partially, the consequences of the two types of error somewhat ШТ follows: 

Consequences of a Type I Error. (Consec 

recommendations when the appropriate 

Purposeless expenditure of a large 
needs for this money exist and, whe 


luences of approving the principal's 
action is disapproval.) 

sum of tax money when other important 
n the error becomes known, the attendant: 


282 


TESTING STATISTICAL HYPOTHESES 


1. publie criticism; 
2. loss of school board members’ confidence; 
3. loss of staff members’ confidence; 
4. possible creation of staff dissension resulting from singling out one 
building for special aid; 
5. general over-all damage to professional reputation; 
6. possible loss of superintendency. 
Consequences of a Type II Error. (Consequences of disapproving the princi- 
pal's recommendations when the appropriate action is approval.) 
Failure to provide needed special facilities which may in the end, by a 
reduction in the incidence of delinquency and by providing the children in- 
volved with a better start on the road toward good citizenship, represent an 
actual saving to the taxpayers, and, when the error becomes known, the 


attendant: 
1. publie criticism; 
2. loss of school board members’ confidence; 
. loss of staff members’ confidence; 
and perhaps some of his teachers—owing to their 


3 
4. loss of principal—: t | 
unwillingness to continue in ап intolerable situation that could have 


been remedied; 
5. general over-all dam: 
6. possible loss of superintendency. 


age to professional reputation; 


two lists of attendant consequences appear almost iden- 
stem from differing basic causes and, hence, may 
ample, if, as we have assumed, the cash 
outlay is great and other important needs for the money exist, the super- 
intendent may regard the public criticism attendant upon a Type I error 
: t which would be attendant upon a Type II 


às much more serious than tha А 
error. Under such circumstances a Type II error might be excused as repre- 
atism in the management 


senting a not too unreasonable degree of conser: 
of tax monies, while a Type I error would appear to be almost inexcusable. 
es attendant upon a Type Terror become more 
serious than their Type II error counterparts, and the superintendent would, 
therefore, feel a very strong need for preventing a Type I error. In this 
Situation he would be led to choose a small a. While we have had him use 
«= 01, it might well be that in the situation we have just described 
9€ = .001 would be even more defensible. m 
On the other hand, suppose that the principal's recommendations are 
relatively inexpensive to implement and that money represents no par- 
ticular problem. Now the various consequences of a Type I error may 
ecome the more serious, since failure to provide needed facilities may now 
be attributed to lack of insight, to lack of wisdom, or even to neglect, 
rather than to justifiable conservatism in the management of tax funds. 
Thus, a Type I error may become a matter of much less concern, justifying 
ап а of .10 or even .20. 


_ Although the 
tical, they nevertheles 
differ markedly in degree. For ex 


Similarly, all other consequence 


283 


TESTING STATISTICAL HYPOTHESES 


Though exceptional situations may arise, it is usually true that the 
consequences associated with Type I errors are the more serious. Retention 
of H, unless nec гу accompanied by some critical action, is an incon- 
elusive sort of result. The Z, while retained, is not proved, a fact which 
may in effect serve to invite further research with perhaps improved 


methods. On the other hand, rejection of H represents a somewhat more 
conclusive type of action which may have a greater tendency to lead to 


general acceptance of the finding and the discouragement of further re- 
search on the problem. Thus, most investigators prefer to be cautious 
rather than precipitous about rejecting a hypothesis. 

There exists an even more important reason for exercising caution with 
respect to Type I errors. It may be possible in certain instances, at least, 
to exercise some degree of control over a Type П error quite independent of 
that exercised over a Type I error. That is, for a given choice of œ, we may 
be in a position to manipulate the probability (8) of a Type II error. In 
other words, we may be able to choose a fairly small o and still, at the same 
time, maintain a small B—i.e., a small likelihood of a Type II error. At 
least we may be able to accomplish this in those situations in which a Type 
II error might become a matter of real concern. It is for these reasons that 
a-values in excess of .05 are rarely used. In fact, such @ values should be 
used only when accompanied by special Justification. As will be explained 
in the next section, it is actually only the Type I error over which we can 
exercise a complete arbitrary control. While there are ways in which we 
may, for a given a, reduce the likelihood of a Type II error, we can never 
be certain of the exact degree of control we are exercising over this type 
of error. 


10.12. CONTROLLING Type II Errors 


The probability, 8, of a Type II error depends upon four factors: (1) 
the value of æ selected, i.e., the degree of protection against a Type I errors 
(2) the location of the critical region, №; (3) the variability of the sampling 
distribution of the statistic, S; and (4) the amount by which the actual 
value, 0, of the parameter differs from the value, H, which is hypothesized 
for it. Because in any real situation 0 is unknown, the last of these four 
factors can never be known. It is for this reason that the degree of control 
exercised by a given statistical test over a Type II error ean never be 
determined. We can only indicate, in the case of a particular statistic? 


test, what this degree of control would be for an assumed discrepancy 
between 0 and Н. 


To illustrate we shall de 
problem of the principal 


termine the value of B in Solution I of the 
and the superintendent in the special ease in whieh 
the actual mean for the population involved is assumed to be 90 1Q points 
In Solution T, X, in terms of the X. scale, extended downward from 94.17: 


284 


28 
TESTING STATISTICAL nypoTHESP 


If, as we have assumed, u = 90, the approximate sampling distribution of 


X will be a normal distribution with mean at 90 and an estimated standard 
, as before. This distribution is pictured in Figure 10.4. Now 


error of 2. 


R 
| 1 
82.5 J - ©я=2.5 92.5 | 95.0 97.5 
p=90 94.17 


X-Scale 


Figure 10.4 Approximate sampling distribution of X for 
random samples of 63 cases selected from a population 
having u = 90 


in this situation a Type H error will occur when X > 94.17. The proportion 
of the area of the sampling distribution above 94.17 (sce shaded portion of 
Figure 10.4) is approximately .0475.*. Hence, the approximate probability, 
B, of a Type II error is .0475. That is, if in this situation our particular 
Statistical test were to be repeated indefinitely, 4.75 per cent of the decisions 
it would direct us to make would be errors of the second kind.f 

To illustrate how the choice of @ effects the value of B we shall suppose 
that in Solution I the superintendent had selected an a of .001. In this case 
R would have extended downward from approximately 92.28 [since 
Xp (2.5)(— 3.09) + 100 = 92.28]. In the distribution of Figure 10.4 a 
A-value of 92.28 corresponds to а z-value of + 0.91 and the approximate 
value of 8 now becomes .1814. Similarly, had the superintendent elected to 
Use a= 05, the approximate value of В would be only 0091.4 hus, we 
See how the use of a smaller @ increases the probability of a Type II error, 
Whereas the use of a larger a decreases it. 
ротна е of the location of R upon the value of B, let ux 


To illustrate the е А 
superintendent's approach to the problem 


Suppose that in Solution I the rc. " : 
Was similar to that described in Solution VI, in that he wished to consider 


hot only the alternative possibility that д < 100 but also the alternative 


94.17 cor yonds toz = + 1.67. The area above z = + 1.67 may be obtained from 


Table II, Appendix €. 

Tt should be noted that if = 90, 

JU kind are impossible. Why 
he student should verify this result. 


and the H-value is taken to be 100, errors of the 


285 


TESTING STATISTICAL HYPOTHESES 


possibility that и> 100. In this ease he would, of course, lovate R х0 that 
part of it would lie at each end of the hypothesized sampling distribu- 
tion. The lower part would extend downward from X = 93.55 [since 
Nr = (2.5)(— 2.58) + 100 = 93 and the upper part would extend up- 
ward from X = 106.45 [since Vp = (2.5) CF 2.58) + 100 = 106.45]. Now, 
if, as before, we assume the actual value of и to be 90, then the value of B is 
the probability of X in that part of the хеше between 93.55 and 106.45. 
This is the same ax the probability of z between + 142 and + 6.58, which, 
for all practical purposes, is simply the probability of z > + 1.42. Hence, 
in this situation, the approximate value of В is .0778. and we see that the 
price for guarding against the additional alternative that u > 100 is an 
increase in Û from .0475 to .0778. 7 

To illustrate the effect of the variability of the sampling distribution 
upon the value of В consider Solution II to the problem of the principal 
and the superintendent. In this solution, which was based on the median 
rather than the mean, the approximate standard error of the sampling 
distribution was 3.13 as compared with 2.5 in Solution I. In terms of the 
scale of values of the median, R extends downward from 92.71 [since 
mdnp = (3.13)(— 2.33) + 100 = 92.71]. Now, if the population median, 
£ is 90 then the approximate sampling distribution of the median is а 
normal distribution with mean at 90 and an estimated standard error of 
3.13. In this situation В is the probability of a median value greater than 
92.71, or the probability of a z-value greater than + 0.87. Hence, B=.1922 
and we see that the use of this less stable statistic (mdn) is at the price of an 
increase in В from .0475 to .1922. 

Finally, we shall illustrate the effect upon В of the amount by which 
the actual value of the parameter differs from the value hypothesized for it. 
Assume the actual value of the population mean to be 95 instead of 90, 
which is only 5 points, rather than 10, below the hypothesized value. Then, 
of course, the sampling distribution of X will be centered on 95 and the 
upper limit of R (i.c., 94.17— scc Solution I) will be in the lower half of this 
distribution. In this situation f is the probability of a X-value greater than 
94.17 or of a z-value greater than — 0.33+. Hence, В is approximately 
.6293. On the other hand, if the actual value of the population mean is 
assumed to be 85, that is, a distance of 15 IQ points below the hypothesized 
value, then the probability of a X-value greater than 94.17 corresponds to 
the probability of a z-value greater than + 3.67 so that @ = .0001. Thus, 
we see that the closer и is to the hypothesized value (M), the more likely 
we are to commit a Type II error, while the further и is from H, the less 
likely we are to commit such an error. This is clearly a desirable feature of 
the test procedure. The variations in the value of B associated with the 
situations we have presented are summarized in Table 10.1. 

It should now be clear that Type II errors cannot be controlled in the 
same arbitrary manner as Type I errors. In fact, the probability of а 


286 


TESTING STATISTICAL HYPOTHESES 


TABLE 10.1 Summary of Variations in B in Seven Selected 
Tllustrative Situations 


и a R S Gs B 
90 “1 Lower End X .0475 
90 O01 Lower End bd 1814 
90 05 Lower End X .0091 
90 01 Both Ends X .0778 
90 101 Lower End тат 1922 
95 Ol Lower End x 6295 
85 01 Lower End X .0001 


Type IL error in a. given situation can only be estimated for particular 
assumed values of the population parameter. It may occur to the student, 
therefore, that our discussion of this problem is more theoretical than 
Practical. Although this may be true to some extent, there is, nevertheless, 
much to be gained in planning statistical tests from an analysis of the 
expected frequency of Type П errors for various possible alternative values 
of the parameter. How this can be accomplished with tests of the type we 
have been illustrating will be shown in the following section. 


10.13 Tue POWER OF A STATISTICAL TEST 


Suppose that the actual value of a population parameter, 0, differs by 
Some particular amount from the value, H, hypothesized for it. This fact, 
of course, is not known to the statistician testing H and he selects some 
level of significance (a) to afford him that degree of protection against a 
Type I error which he deems necessary. We have illustrated how, in such 
a situation, the probability (8) of oceurrence of a Type П error may still 
vary depending upon the critical region (R) chosen and or upon the vari- 
ability of the sampling distribution of the statistic (5) employed. Now in 
this Situation rejection of H is the desired correct outcome. The probability 
that this outeome will be reached is the probability that S falls in R. We 
Shall refer to this probability as the power (P) of the test. Since B repre- 
Sents the probability that 5 does not fall in R, and since 5 either does or 
docs not fall in R, it follows that P = 1— B. 
ver of a test of a statistical hypothesis, H, is the 
ў of H when the true value of the 
f a statistical lest is the probability 
R, when 0 differs from H. 


Derinrrion. The ро! T d 
Probability, P, that it will lead to rejection 
Parameter, 0, differs from H. Or, the power о а 
that the statistic S, will fall in the critical region, 
a test is the probability that it will detect 


In o words the power of ? 
ther words the 1 P= 1— B, and since В can be evalu- 


alsity in the hypothesis. Now since 


287 


TESTING STATISTICAL HYPOTHESES 


ated only for assumed values of 0, it follows that Р, also, can be evaluated 
only for assumed values of 0. However, this does not in any way prevent 
the concept of the power of a statistical test from being a useful criterion 
for the evaluation of such tests. It is used in comparing statistical tests by 
simply determining their respective powers for all values of @ which are 
possible alternatives to H. Such determinations are usually presented 
graphically in the form of power curves. 

DEFINITION. The power curve of a test of a statistical hypothesis, H, 18 


the plot of the P-values which correspond to all -values that are possible al- 
ternatives to H. 


As an example, we shall construct the power curve for the statistical 
test employed in Solution I of the problem of the principal and the super- 
intendent. In this solution there exists an infinite collection of u-values 
(и < 100) which are possible alternative values to the hypothesized value 
of 100. Obviously, in this situation we cannot determine the P-values 
associated with all possible alternative u-values. We shall content our- 
selves, therefore, with the determination of P-values corresponding to 
selected possible alternative w-values. After plotting these P-values we 
shall use them as guide points to sketch the smooth continuous curve which 
is the locus of all such P-values. We shall begin by determining the P-value 
corresponding to и = 98. 

1. Determination of P for u = 98. 
R in Solution I is X = 94.17, and if w= 98, then the actual Y distribution 
is as pictured in Figure 10.5. 
Here P= Р(Х = 94.17 | ND: w= 98; 05 = 2.5)*—See shaded arca in 
Figure 10.5. 
But 94.17 corresponds to z= (94.17 — 98)/2.5 = — 1.53 
P=P(z= — 1.53|ND: u20; c= 1) = .063 
2. Determination of P for u = 96. 
Here P= Р(Х = 94.17 | ND: w= 96; бу = 2.5) 
Since in this situation the sampling distribution is centered on 96, it 
follows that 94.17 corresponds to 2 = (04.17 — 96)/2.5 = — 0.73. 


P= P(z = — 0.73 | ND: p=0; с = 1) = .233 


3. Other values of P determined similarlyt are: 


For pgi Р 0527; 
For p= 92, P = .808. 
For p= 90, P = .953. 
For u = 88, P = .993. 


*Read: Power equals the probability of a value of 94.17 or less in a normally distributed 
universe having a mean of 98 and an estimated standard de ion of 2.5. 
{It is important that the student verify enough of these values to master the procedure. 


2 8 8 TESTING STATISTICAL HYPOTHESES 


The P-values corresponding to these selected p-values have been 
plotted in Figure 10.6 (see dots). The smooth curve sketched through 
these P-values is the power curve of the particular statistical test used in 


[E 


a, 
| l 1 
955 } 100.5 103 105.5 


9417 1-798 
X-Scale 


Ficure 10.5 Approximate sampling distribution of X for 
random sample of 65 cases from a population having p= 98 


Solution T of the problem of the principal and the superintendent. The 
D-scale placed below the u-seale simply indicates the discrepancies between 
the possible alternative и-уайпез and Н = 100. The order of subtraction 
used was д = H so that negative D-values indicate u-values less than H. 


P-Scale 
a 


id 
2 
J а=.01 
0 u-Scale 
85 90 95 100SH 
рар a 
—15 —10 —5 0 


Figure 10.6 Power curve of statistical test of Solution I of the Problem 


of the Principal and the Superintendent 


289 


TESTING STATISTICAL HYPOTHESES 


This power curve may be used to read the probability of rejecting H for 
any given possible alternative value of ш. It will be observed that the 
power of the test increases as the discrepancy (D) between ш and H 
increases in absolute value. Thus, for a D of — 5, the chances of the test 
detecting the falsity of // = 100 are only about four out of ten (actually 
371 in a thousand), whereas for a D of — 10, the chances become better 
than nine out of ten (actually 953 in a thousand). 


1.0 


7 
E 67] 
8.5 
а 4 
si 
<2 
л 
ý | 
85 90 95 M 
100-H 


por £-Scale 


Ficure 10.7 Power curves for Solutions I, II, and III of the Problem 
of the Principal and the Superintendent 


To illustrate how power curves may be used to assess the relative 
effectiveness of various statistical tests, we have superimposed the curves 
for the first three solutions to the problem of the principal and the super- 
intendent on the same axes (see Figure 10.7). For each of these tests the 
P-values corresponding to selected alternative values of the parameter are 
given in Table 10.2. These are the values which were plotted as guide 


TABLE 10.2 "m Pm 
Values of P Correspond- Eo =H uds 
ing to Selected Values of 96 176 
M or Ё for Solutions I, 11, 94 409 
and IIT of the Problem of 92 681 
the Principal and the 90 879 
Superintendent 88 969 

86 .005 

84 

— 


290 TESTING STATISTICAL HYPOTHESES 


points in sketching the curves. We have already shown how the P-values 
were computed for Solution I. We will show how the Ри and Ри values 
were determined for £ or u = 98. The student should verify other Pir and 
Pin values until he feels confident of the procedure. 

To determine Pu for €= 98 we first need to express /0 in terms of the 
scale of values of the statistic (тал) used. The upper limit of X in terms of 
the mdn-scale is given by: 


33) + 100 


mdng = (3.13)(— ° 


eda. 
Le., R: mdn = 92.71 
Then Py = P(mdn = 92.71 | ND: w= $= 98; 8,4, = 3.13) 
But in this ND, 92.71 corresponds to z — (92.71 — 98) /3.13 = — 1.69 


Py = P(e = — 1.69 | ND: w= 0; o = 1) = 046 


Similarly to determine Pir for u = 98 we must first determine R in 
terms of the Y-scale. Here we have 
X p = (2.86)(— 2.33) + 100 = 93.34 
le, R: X < 93.34 
‚= 2.86) 


6 
2.86 = — 1.63. 


Then Pure P(X = 93.34 | ND: p= 98; 
But in this ND, 93.34 corresponds to z = (93.34 — 98), 


Py = P(e < — 1.63 | ND: w= 0; 0— 1) = .052 
ure 10.7 shows the statistical test 


Inspection of the power curves in Fig : 
for any value of the 


9f Solution I to be the most powerful of the three у А 
Parameter alternative to J = 100, while the test of Solution II is least 
Powerful for any alternative value of the parameter. It should also be 
Observed that if u or Ё equals the hypothesized value of 100, then for all 
three tests the probability of the test statistic falling in 7? is equal to the 
Selected level of significance (a= .01) That is to say, елен 
equally effective at an arbitrarily predetermined level insofar as control 
Over a Type I error is concerned—which, of course, 15 the'only type of error 
Possible w : £ equal H. 

The MUT is the most powerful Deme Loss cade tron 
Of the statistie employed is the smallest. For a given A е thie standard К 
of the median, which was the test statistic used in Solution II, is about 


25 times larger than that of е mean—see (9.7). The mean was again 
Used as the test statistic in Solution III but this time with a smaller sample 
SO that a sampling distribution more variable than that of Solution I 
resulted. = 


291 


"DES hs 
ESTING STATISTICAL HYPOTHESES 


Figure 10.8 was developed to help the student obtain a clearer picture 
of how the variability of the sampling distribution affects the power of 
statistical tests. The curve on the right in the upper part of Figure 10.8 
represents the hypothesized sampling distribution of X for the statistical 


Ox =2,5 100=H 


Fiaure 10.8. Comparison of powers of two tests of IH = 100 if и = 95 and 
where standard error of one test is 2.5 times that of the other 


2 9 2 TESTING STATISTICAL HYPOTHESES 


test of Solution I of the problem of the principal and the superintendent. 
The other upper curve represents the actual sampling distribution of X as it 
would appear if the population value of и were 95. The shaded portion of 
this latter curve represents the probability of Y in R—that is, the power of 
the test. When д = 95 this test has only about four chances out of ten 
(actually P = .371) of detecting the falsity of = 100. 

Now suppose the investigation had been conducted in such a way as to 
reduce the standard error of the sampling distribution from 2.5 to one.* 
The right-hand curve in the lower part of Figure 10.8 represents the hy- 
pothesized sampling distribution of X asit would now appear. Note that Р 
in this situation has the upper limit 97.67, since (1)(— 2.33) + 100 = 97.67. 
The other lower curve represents the actual sampling distribution of Y as 
it would appear if w= 95. It is clear that the effect of thus reducing the 
Standard error is to provide a test that is almost certain (Р = .996) to detect 
the falsity of H = 100 when p= 95. 

We shall conclude this section with a comparison of the powers of the 
Statistical tests used in Solutions IV, V, and VI of the problem of the prin- 
cipal and the superintendent. The powers corresponding to selected differ- 
ences between H and possible alternative values of the population parameter 
are given in Table 10.3. These P-values are plotted and the power curves 


TABLE 10.3 Values of P Corresponding to Differences (D) be- 
d tween H and Selected Possible Alternative Values of 
ф for Solutions IV, V, and VI of the Problem of the 

Principal and the Superintendent 


D-2ó-H| Pw py |Dn2é-HU | Ри 
0 01 01 0 

+ .02 034 027 + .02 
+ 04 062 + .04 
+ .06 127 + .06 
+ .08 .230 + .08 
+10 367 +.10 
+ .528 +.12 
+14 688 +.14 
+ 16 821 a 

.913 cs ali 
С r^ 966 + .20 
+29 .990 + .22 
+ 24 .998 + .24 


Pu ше 


эр, зеде aat 
This could be done by increasing the sample to about 400 cases. For, assuming 8 to 
Tema; А Ta : No л 3 be 20), we have 
emain fairly stable (in Solution I we assumed 8 to , 

20 


EE m 
987 VA CI 


=1 


293 


TESTING STATISTICAL HYPOTHESES 


shown in Figure 10.9. We shall present the computation of Pry, Py, and 
Рут for D=.12. The student should verify other selected P-values until 
confident of the procedure. 


P-Scale 


1 i 
—.20 —.10 0 +.10 +.20 
D-Scale 


Ficure 10.9 Power curves for Solutions IV, V, and VI of the Problem 
of the Principal and the Superintendent 


1. Pry for D = 0 — Н = .87 — .25 = 12 
Here R: p= .3509 (sce p. 277) 


Now, if ¢ — .37, the actual sampling distribution of p is approximately 
a normal distribution with u = .37 and 


[xx 
ъ= eg = .0183 


"Therefore, Prv = Р(р = .3509| ND: u= .37; с, = .0483) 
But in this ND, p = .3509 corresponds to z = (.3509 — .37)/.0483 = — .40 
Piy = P(z= — 40| ND: w=0; с= 1) = .655 
. Py for D2 $ — H = 62— .50= .12 
Here R: z= 2.33 or p= (.05)(2.33) + .5 = .6165 


N 


Now, if ф = .62, p is approximately normally distributed with u = 62 
and 


294 TESTING STATISTICAL HYPOTHESES 


.62 X .38 к 
C= xi 100 ^ 0485 


Therefore, Py = P(p = .6165 | ND: w= .62; op = .0485) 
But in this VD, p = .6165 corresponds to z = (.6165 — .62)/.0485 — — .07 
Py = Phe = = .07 | ND: u =0; с= 1) = .528 
3. Py for D = $ — И = 62 — 20 = 12 


Here R: z= — 2.58 and z= + 2.58, or 
р = (.05)(— 2.58) + .50 = .371 and p= (.05)(+ 2.58) + .50 = .629 


Now, if ¢ = .62, p is approximately normally distributed with u = .62 
and 
.62 x .38 = 
= ү |e = 018: 
a J 100 2 


Therefore, Pyr= Р(р = .371 | ND: u= .62; op = .0485) 
+ P(p = .629| ND: w= .02; с = .0485) 
But in this ND, р = 371 corresponds to г = (371 — .62)/.0485 = — 5.18 
aud p = .629 corresponds to z= (.629 — .62)/.0485 = +.19 
SO Pyı= P(e = —5.13|ND: к= 0; с = 1) 
+ P(z> +.19| ND: u =0; с= 1) 


= .000 + .425 = 425 
_ We see from an inspection of Figure 10.9 that of these three last solu- 
tions to the problem of the principal and the superintendent, TV is the most 
Powerful for alternative values of ф greater than the values hypothesized. 
Solution IV is more powerful than Solution V for such alternative values, 
OWing to the fact that the standard error of the sampling distribution of p 
decreases as the value of the parameter, ф, differs more and more from .5 
[see (9.0)] It will be recalled that both H and the possible alternative 
Values differed more from .5 in Solution IV than in Solution V.* This 
Advantage of Solution ТУ over V may, however, be more apparent than 
Teal. It is real only if we can regard the difference between, say, .25 and 
27 as representing a difference of the same order of magnitude as that 
between 150 and 52. Furthermore, the normal distribution is a much less 
accurate model of the sampling distribution of p in Solution IV than in 


Solution V (see p. 253). 

; Solution V is more pow 1 
values of ф greater than H. This is becat 
ع‎ oree 

sible alternative values were o> .25. 


* —— ЕЕ 
In Soluti г : 
£ tion IV, H = .25 and the pos و‎ Se : 
n Solution V, H = d dod the possible alternative values were @ > .50. 


erful than Solution VI for possible alternative 
ise Solution VI not only provides 


295 


TESTI NG STATISTICAL HYPOTH ESES 


protection against a Type II error for possible alternative values of Ф 
greater than IZ, but also for possible alternative values of ф less than H. 
Solution V, like IV, provides no protection at all against the possibility of 
alternative values of ¢ less than II. The choice between Solutions V and 
VI, therefore, clearly depends upon whether or not the conditions of the 
problem demand a statistical test which will be sensitive to possible alterna- 
tive values of the parameter on both sides of H. 


10.14 THE ARBITRARY Aspects OF STATISTICAL Tests: A SUMMARY 


In this section we shall attempt to pull together ideas developed in the 
foregoing sections by directing attention to the arbitrary decisions which 
enter into tests of statistical hypotheses. 


Arbitrary Decision 1: Choice of Statistic 

In presenting the various possible solutions to the problem of the prin- 
cipal and the superintendent, we have attempted to illustrate how different 
statistics may be applied to the solution of the same general problem. Two 
considerations are of major importance. First, it is essential that the 
statistic chosen be valid as an index of the general, as distinguished from the 
statistical, hypothesis involved. In the problem of the principal and the 
superintendent, for example, we might express the general hypothesis by 
saying that the population which will attend a particular elementary school 
during some limited period in the future (a period determined by the 
expected useful life of certain physical facilities and equipment) is pre- 
dominantly made up of children who are sufficiently retarded mentally to 
require special handling by a specially trained staff using special facilities and 
materials costing approximately X dollars. The statistical hypotheses which 
we tested in the different solutions represented a variety of attempts to 
express this general hypothesis in valid quantitative terms amenable to test. 
Some of our attempts are perhaps more valid in this sense than others. 
It may even be that there exist possible approaches not chosen by us that 
are more valid still. In any case, the practical usefulness of the statistical 
test depends upon the degree to which it provides a valid attack upon the 
general problem. 

The second consideration in the selection of a statistie has to do with 
its efficiency in the sense of having a small standard error. The reason for 
this requirement was developed in the preceding section in the discussion 
accompanying Figure 10.8. 


Arbitrary Decision 2: Choice of Level of Significance (a) 
The possibilities in choosing a level of significance are actually un- 
limited since any conceivable probability value between zero and one may 


2 9 6 TESTING STATISTICAL HYPOTHESES 


be adopted. The consideration determining the choice is the relative seri- 
ousness of the consequences of Type I and Type H errors. When Type I 
errors appear to be the more serious, small values (0.001, 0.01, or 0.02) 
are used. When Type I errors appear to be the more serious, larger values 
C10 or 20) are used. A commonly employed compromise value is .05. The 
choice of a level of significance determines the degree of control over а Type 
I error which is the only type of error over which it is possible to exercise 
complete arbitrary control. As we explained in the concluding paragraphs 
of Section 10.11, the consequences of a Type I error are ordinarily more 
serious than those of a Type I error. It is unusual, therefore, to find the 
larger values (.10 or .20) employed. In fact, their selection should always 
be accompanied by special justification. 


Arbitrary Decision 3: Choice of Critical Region (R) 

For a given level of significance (œ), the possibilities in choosing a 
critical region (R) are unlimited. Any Ё, however chosen, will be as good 
as any other R for controlling a Type I error if the same a applies, since for 
all such R's the probability of the statistic falling in the region is @ if the 
hypothesis is true. However, we have shown in the foregoing sections that 
Шу effective with respect to controlling Type TI 


all such R's are not equi 
eration governing the choice of R is the effective- 


errors. Hence, the consid 
Ness of the control it provides over Type II errors. The most effective R's 
from this standpoint are those located at the extremes of the sampling 
distribution of the test statistic. Whether R should be located entirely at 
One end of the sampling distribution or divided into two portions, one 
located at each end, depends on whether the general conditions of the 
Problem are such that the value of the parameter could differ from the 
Value hypothesized for it in only one or in both directions. 

A word of caution is in order at this point. Because of the examples 
With which we have introduced tests of statistical hypotheses, the student 
May gain the impression that one-ended R's are commonly employed. 
Actually in most research situations that involve tests of statistical hy- 

and edueation), the possible alternative 


Potheses (at, least in psychology i ў 
Values of the parameter lie to either side of the value hypothesized. In 


Such situations, of course, à two-ended 7? is mandatory. 


10.15 ESTIMATING SAMPLE SIZE 


In this seetion we shall present the steps involved in estimating the size 
of Sample necessary to bring the power of a statistical test up to some de- 
Sted level for a given discrepancy (D) between the value of the parameter 
and the value hypothesized for it. The solution to this problem requires 


that we first decide upon: 


297 


TESTIN 
ESTING STATISTICAL HYPOTHESES 


1. the value of a; 

2. the location of R—i.c., whether R is to be located entirely at one end or di- 
vided between both ends of the sampling distribution; 

3. the value of the critical discrepancy, D= i.e., an amount such that if the 
actual value of the parameter differs from the value hypothesized for it by 
this amount, the probability of rejecting the hypothesis is 8; 

4. the value of B. 


In addition it is necessary for us to obtain—cither through previous 
research or by means of a small preliminary sample—such information 
about the population as may be necessary to an approximation of the 
standard error of the sampling distribution of the statistic involved. 

We shall first illustrate the procedure using the situation of Solution I 
of the problem of the principal and the superintendent. Again we shall let 
о = .01 and locate R entirely at the lower end of the sampling distribution 
of the statistic, V. In addition we shall let D = — 10, and 8 = .05. That 
is, if the actual mean IQ for the population is 10 points below the hypoth- 
esized value of u = 100, we establish the probability of a Type II error at 
.05. In terms of power this corresponds to P = .95 when и = 90. Suppose 
further that for a small preliminary sample of 10 cases the superintendent 
obtains 8 = 19.1. Then a rough approximation of the population standard 


deviation is: 
б = ON 


Now consider Figure 10.10. The normal curve on the right provides an 
approximate model of the sampling distribution of the statistic, Y, as it 
would appear if the hypothesis, u = 100, were true. The model is approxi- 
mate because ¢ = 20.1 is not reliably determined when based on a small 


= 20.1 [see (9.15)] 


[<=г=+1,64—— Ry < 2=—2,33 X-Scale 
M D=-10 
90 100=H 


Figure 10.10 Diagram of situation involved in estimating М for 
Solution I of the Problem of the Principal and the Superintendent 


2 9 8 TESTING STATISTICAL HYPOTHESES 


preliminary sample. In this figure Ry is the upper limit of the critical 
region (R). Note that if H is true the probability of X falling in R is 
a= .01—i.e., the probability of a Type I error is .01. It is clear that 


If, however, there is a discrepancy of D = — 10 between the actual 
value of the parameter and ил, then the normal curve at the left in Figure 
10.10 is the approximate model of the sampling distribution. Note that in 
this situation the probability of Y not falling in R is B= .05—і.е., the 
probability of a Type II error is .05. We may now write: 

Ry= бу (ии T D) 
20.1 > 33.0 
= “= (+ 1.64 100 — 10) = = + 90 
VN (+ 1.64) + ( ) VN 

Now to estimate (roughly) the size sample necessary to provide the 
Specified control over a Type I error, as well as the specified control over a 
Type II error in the case of the given critical discrepancy, it is necessary 


only to equate these two expressions for Ry and to solve for N as follows: 


VN 
10VN = 79.8 
VN = 7.98 
N = 64 
used by the superintendent 


Thus, we see that the size sample actually | 
tions selected in the above 


(N = 65) was about right for the specifica’ 


example, 
As a second example, involving a different statistic and a two-ended R, 


We shall estimate for the case of Solution VI to the problem of the prin- 
cipal and the superintendent the size sample necessary for (1) an a of .01, 
(2) a B of .05, and (3) a D of 0.1 in either direction. м 
Figure 10.11 represents the situation for a D-value of +0.1. It is im- 
Material whether we work with a positive or negative D-value of 0.1 since 
either leads to the same estimate of N. We begin by writing two expressions 
for Rr, the lower limit of the upper portion of R.* 
MES || 


"When D= + 0.1 the lower portion of R may be ignored since the probability of the 


Statistic (р) falling in it when ф = 0.6 is negligible. 


299 


TESTING STATISTICAL HYPOTHESES 


2=+2,58———> R, ~<2=-1.64>4 


р=0.1 > 
5=H 6 


Figure 10.11 Diagram of situation involved in estimating N for Solution 
VI to the Problem of the Principal and the Superintendent 


(1) Rp, = сро + фи 
3X 3 (+ 2.58) +5 
(2) Rr = @pz6 + (Pr + D) 
= NE x :3 C 1.64) + (54-1) 


1 


Now equating these two expressions and solving for №, we have: 


4 کف × 5. 


a (2.58) + 5 = (= 1.64) + (.5 + .1) 
(2.58) У25 E+ (1.61) V3 fp = 1 
(1.29) + озо Fea 
EN 1 
2.09 Rea 
4.37 (= 01 
01 № = 4.37 


N = 437 


Thus we see that to meet this selected set of specifications the super- 
intendent would have needed a sample of approximately 437. In Solution 
VI we had him using only 100 cases. Our previous investigation of this 
solution showed its power for D = + .1 to be only .278 (see Table 10.3). 
In other words, if D = + .1, the probability of a Type П error when N = 100 


300 TESTING STATISTICAL HYPOTHESES 


is .722. Actually, sample size is not as important to the control of Type I 
errors as it is to the control of Type II errors. For proper control of Type 
I errors it. is necessary only that the sample be large enough to justify the 
use of the normal curve ах a model of the sampling distribution. On the 
other hand, as this example shows, sample size is extremely critical as a 
factor controlling Type I errors. This follows as а result of the effect of 
sample size upon the standard error (see discussion relating to Figure 10.8). 

One other comment is pertinent. The usefulness of this procedure in 
much psychological and educational research work is somewhat. lessened 
because of the difficulties encountered in determining an appropriate value 
for the critical difference (D) in terms of the type of scale units commonly 
involved. Whenever possible, however, it is advisable to attempt to 
establish some reasonably suitable value for D and to use the routine 
described to obtain at least some rough notion of the sample size necessary 
to the desired degrees of control over the two types of error. 


10.16 A PSYCHOLOGICAL PROBLEM” 


A psychologist reviewing reports of experimentation on the effect of 
Punishment upon speed of learning was impressed by the fact that in de- 
Signing their experiments the researchers endeavored to associate the pun- 
ishments with failures and even with successes, but never with both failures 
and successes at the same time. The experimental evidence appeared to be 
clear that punishment following either failure or success increased the speed 
of learning over that occurring when no punishment was involved. It 
seemed to him that this might well be the result of the punishment itself 
becoming a response cue so that the increase in speed of learning might be 
explained in terms of differential secondary reinforcement rather than in 
anxiety induced by punishment. At least it 


terms of drive heightened by i = 
Appeared to him that these factors must have been thoroughly confounded 
(mixed) in the experiments thus far conducted. It occurred to him that 


by Punishing both successes and failures the possibility of differential 
Secondary reinforeement would be removed. Then any differences in speed 
of los h a no-punishment situation would be due to 
Some motivational component such as anxiety induced by ا‎ ata 
te reasoned, for example, that the anxiety thus induced E operate in 
either of two wavs: (1) it might heighten the drive to learn as quickly as 


hing as compared with 


"Although the situation described in this ape ? мы me] m. no ше 
sented Entis s perimental work e 1 B vemos 

and T A ee Е andan with Shock for Rigbi ee bit ong Responses 
™ the Same Subjects.” Journal of Comparative and T al j sye i EUN ol. 45, 
June 1952, рр 264-268): and by C. М. Freeburne and Marvin Se ш ic er ( ; lock for 
Right and Wrong Responses During Learning and De ум прп Чы 

бита! of Experimental Psychology, Vol. 49, Мат 19; 5, и 155 HA z Ишей that 
violences which have been done to psychological earning theory erlooked in 


'e Interest of developing а pedagogical example. 


301 


TESTING STATISTICAL HYPOTHESES 


possible, or (2) it might so frustrate the subject that speed of learning would 
be impeded. If the effect of punishing both successes and failures could be 
shown experimentally to increase speed of learning, then it might be inferred 
that the first of these ways dominates. If the effect of such punishment 
could be shown to impede learning, then it might be inferred that the 
second of these ways dominates. Finally, if the effect of such punishment 
were nil, then it might be inferred that these ways either tend to cancel 
each other out or are inoperative. In thus reviewing the situation, the 
psychologist also reasoned that severity of punishment would operate as а 
variable to influence the balance between these ways. 

The psychologist decided to attack the problem experimentally with 
two groups of human subjects: (1) a no-punishment group (NP), and (2) а 
punishment group (P). As a learning task he decided to use a series of 20 
successive right-left choices between two punch keys. He arbitrarily 
selected the following series in which the total number of right (R) and 
left (L) were the same: R LR RLLRLLRLRLLRRLRR L. 

To indicate to the subject whether or not a given choice was correct he 
decided to rig his apparatus so that a buzzer tone would accompany each 
correct choice. Thus, the task involved trial-and-error learning of the 
correct sequence. As a form of punishment he decided upon an electric 
shock to be applied immediately following each choice regardless of whether 
it was correct or incorrect. He decided that by means of a preliminary 
trial he would attempt to determine the maximum shock each subject could 
stand without displaying evidence of severe discomfort. The punishment 
used with a given subject at the start of the experiment was this maximum 
shock as specifically determined for him. The experimenter further decided 
that during the course of the learning activity he would gradually increase 
the shock to compensate for the subject’s adaptation to it. In this way he 
hoped to induce and maintain a maximum anxiety without at the same 
time causing the complete disintegration of the learning situation. Ава 
criterion measure of speed of learning he decided to use the number of trials 
required for two successive series of 20 correct. choices. 


10.17 A PSYCHOLOGICAL PROBLEM: EXPERIMENT I 


From a large class of college sophomores enrolled in an introductory 
psychology course, the psychologist selected two groups of 50 and 65 at 
random, and assigned them respectively to the punishment (P) and no- 
punishment, (NP) conditions. The criterion scores he obtained are show? 
in Table 10.4. 

Before we have the psychologist apply the technique of testing statisti- 
cal hypotheses to these data, we should consider the character of the popu- 
lation or populations to which the findings may be generalized. Because the 
problem lies in the field of human learning, the psychologist will naturally 


302 TESTING STATISTICAL HYPOTHESES 


TABLE 10.4 Crilerion Scores for Two Experimental Groups in 
Experiment I on the Effect of Punishment on Speed 
of Learning 


P GROUP NP GROUP 
28 21 23 22 40 23 16 75 11 
19 18 17 18 63 34 16 40 7 
9 21 16 24 8 51 33 27 58 
i 9 


2 = 21,216 
DX)2/n = 19,286.48 
7 2 


Wish to be able to generalize his findings as widely as possible—perhaps to 
the entire population of all human beings capable of mastering the particu- 
lar task. The situation might be expressed as follows. 


Suppose that all human beings capable of learning the task could some- 


how be required to do so under the no-punishment condition. Next suppose 
this learning, together with any experiences accruing from it that might 
affect future learning, to be somehow completely extinguished from all 
these people, Then suppose the task to be relearned by all these people 
Under the punishment condition. We thus generate two hypothetical sets 
of learning scores. Although only one human population is involved, it 
will be convenient for us to think of the two sets of performance scores— 
ne representing the totality of experience with human performance on a 


arni aiaia М r another condition— 
lear ning task under one condition and the other unde on 


AS two populations of scores to which sample findings might be generalized. 
Obviously these nos populations of scores will be alike only if the effects of 


the conditions are the same. 


303 


TESTING STATISTICAL HYPOTHESES 


Now it is clear that if the psychologist wishes to generalize his sample 
findings to two such hypothetical populations of scores, he ix in the position 


of wishing to generalize findings based on samples taken from one pair of 
populations to a different pair of populations, for his samples must be re- 
garded as having been taken from two hypothetical populations of scores 
such as might be generated from all sophomores enrolled in an int roductory 
course in psychology in a particular college at a particular time. Therefore, 
before he can generalize to the populations of scores representing all human 
beings, he must assume that these populations are respectively like those 
from which he may be presumed to have selected his samples. Clearly, if 
such an assumption is to be made, some justification is mandatory. 

One important consideration in particular may be of help. ‘There are, 
of course, great individual differences in ability to learn, Some subjects 
are able to master a given learning task more quickly than others, regard- 
less of differences in conditions. The issue at stake is not how Subject А. 
learning under one condition, compares with Subject B, learning under 
another, but rather how the over-all performance of the Condition 1 popu- 
lation compares with that of the Condition 2 population. The concern 
moreover, and this is the helpful thing, is simply a matter of relative com- 
parison. There is, in the case of the particular problem at hand, no special 
interest in the precise magnitude of the differences between whatever 
indexes of over-all performance may be used. This magnitude, after all, is 
obviously unique to the particular task, and, consequently, not likely to 
be of general value. What actually matters is the answer to the question: 
which, if either, of the two over-all indexes is the larger? Now while it may 
be inconceivable that cither of the hypothetical populations of scores 
generated from the particular group of college sophomores is like its counter- 
part generated from all human beings, it may not be at all inconceivable 
that the difference between the over-all indexes is in the same direction for 
both pairs of populations. 

If this should still appear to the psychologist as too strong an assump- 
tion, his only recourse is to limit his generalization. He may still generalize 
to pairs of hypothetical populations of scores other than those sampled- 
For example, he might define the hypothetical populations of scores as if 
generated from all human adults living in the United States, or as if gen- 
erated from all human adults living in the United States who are from 20 
to 22 years of age, or as if generated from all college sophomores in the 
United States, or as if generated from all college sophomores enrolled in 
colleges of the same type as that which provided the subjects actually used, 
or as if generated from all such college sophomores enrolled in introductory 
psychology courses, ог аз if generated from the college sophomores actually 
studied plus all who will enroll in introductory psychology at the particular 
college involved during the next four or five years, ete. Note that each 
successive suggested source is more restrictive. However, it is also more 


304 TESTING STATISTICAL HYPOTHESES 


like the population actually studied and hence involves a more easily 
acceptable assumption. 

It is, of course, up to the psychologist to decide how far he wishes to 
generalize his findings. In any case, it is essential that he describe the popu- 
lation actually sampled so that other potential users of his findings will be 
in a position to make their own generalizations if his do not satisfy them. 
For purposes of this example we shall have him define his populations as if 
generated from all college sophomores enrolled in introductory psychology 
courses in colleges of the type that provided the subjects used. It is neces- 
sary in this case to assume that the subjects which were taken at random 
from among such students in one particular college are in effect a random 
sample from among such students in all such colleges. This assumption is 
hot unreasonable in view of the particular learning task under investigation. 
It is important for the student to realize, however, that in the case of many 
learning tasks the over-all performances would differ markedly from college 
to college. That is, the populations would differ from college to college. 
When this is the case, the generalizations must be limited to such students 
attend this one particular college at a time not too 


as have attended or will { 
far removed from the year of the experiment. | 
We shall now show how the psychologist applied the technique of test- 


ing statistical hypotheses. 


SrEP 1. Statement of hypothesis. 
The psychologist wished to compare the general level of two hypo- 
thetic: ; arning scores. As an index of general level, he 


1 populations of le: ledio 
arbitrarily selected the mean. Two means may be compared in different 
a certain number of times 


Ways, For example, one mean may be said to һе: ( 
smaller than the other (ratio method). Or the difference between the two 
Means may be observed (difference method). Since the 
familiar with sampling-error theory as it applies to the difference between 
Rule 9.7), he decided upon the latter method. Three 
Possibilities existed: (1) that the conditions are on the average equally 
effective; (2) that the no-punishment condition (VP) is on the average the 
more effective; and (3) that the punishment condition (P) is on the average 
the more effective. If АР КҮР represents the difference between the 
means of the hypothetical populations of Pe and N P-scores these tics 
Possibilities may be stated symboli ally in terms of this difference as 


oycehologist was 


Sample means (see 


follows: 
llows: (D дь uxo 0 
(2) up— Eve > 0* 
(3) up — Hyr < 0* 
— — 


жа r of trials required to learn, the faster the learning. 
nce indicates superiority for the NP condition, while 
icates superiority for the P condition, 


*Note that the fewer the numbe 
епсе, a positive up — UNP differe: 
a negative дь — рур difference ind 


305 


TESTING STATISTICAL HYPOTHESES 


The psychologist elected to test statistically the first of these possi- 
bilities. That is, he hypothesized that up — иур = 0. The alternatives are 
ир = Hyp > 0, and ue — Hyr < 0. 


Srep 2. Selection of a. 

We shall assume this experiment to be the first of its kind. We shall 
further assume that at the time it was conducted the thought of punishing 
both successes and failures would have been viewed by most authorities as 
an extremely radical departure from sound practice. Consequently, the 
psychologist would have been very greatly concerned about rejecting H, 
i.e., possibility (1), especially in favor of possibility (3), if А were actually 
true. Being thus extremely anxious to avoid a Type I error, he elected to 
let œ = .001. 


STEP 3. Specification of Р. 

The situation as we have described it clearly calls for a two-ended R. 
The simplest way to specify it is in terms of z as a test statistic. However, 
to help the student see clearly the application of the sampling-error theory 
involved, we shall first have the psychologist specify it in terms of the test 
statistic Xp—Xyp. According to Rule 9.7, the sampling distribution of 
Xp — Хур tends toward a normal distribution as the sample sizes increase 
(the psychologist’s samples of 50 and 65 are adequate for this theory). It 
further states that this distribution has a mean equal to the difference 
between the means of the two populations involved. ‘This implies that if 
Н is true, this distribution has a mean of zero. Finally, an estimate of its 
standard error may be made by means of formula (9.28). The computation 
of the values needed for (9.28) is outlined in Table 10.4. The Er? values 
were obtained by application of (6.6), and the $? values by application of 
(6.4). Substituting in (9.28) we have 


38.5904 , 582.5505 
50— 1 65—1 
= V.7876 + 9.1024 

= V 9.8900 

=3.14 


With this knowledge and information the psychologist was able to draw 
the sampling distribution approximately as it would appear, assuming Н 
to be true. The sketch is shown in Figure 10.12. An obtained Y» — ANP 
difference greater than zero may be due either (1) to the chance composition 
of the particular samples drawn, or (2) to the fact that bee — yp > 9, in 
which case H is false. This latter possibility, which indicates superiority 
for the NP condition, will be adopted should the obtained Xp- Хиғ 
difference fall into the upper portion of R, Similarly an obtained Xp — X^ 
difference less than zero may be due either (1) to chance as before, or (2) to 


306 


TESTING STATISTICAL HYPOTHESES 


the fact that up — шур < 0, in which case H is false. The latter possibility, 
which indicates superiority for the P condition, will be adopted if the 
X p — Ху» difference falls into the lower portion of R. 


OxXp—Xnp= 3.14 


+3.14 +6.28 +942 
Ry =-10.33 0 Rp =+10.33 


Xi p — XNp-Scale 


Figure 10.12 Approximate sampling distribution of Xbr—Xwr 
for Н = 0 showing lower bound (Rx) of upper portion and upper 
bound (Ru) of lower portion of R 


Now, using a table of areas for a normal distribution (such as Table П, 
Appendix C), the psychologist found that .0005 (i.e, 2/2) of the area 
extends upward from z = + 3.29 and that a like fraction extends downward 
from z = — 3.29. He converted these two values into terms of the X p — Хур 
Scale by means of formula (8.5) thus: | 

For R,, the lower bound of the upper portion of R, 

Р, = (3.14)( 3.29) + 0 
= + 10.33 


For Ru, the upper bound of the lower portion of R, 
Ry = (3.14)(— 3.29) +0 
= 10.33 
Hence, R is as follows: 
Yp—Xwyp= + 10.33 and Xp— Хур = — 10.33 


To specify R in terms of the z-scale, we have only to write: 


224-329 and 2=— 3.29 


Step 4. Determination of the value of the statistic. | 
To determine the value of the statistic when Ё is specified in terms of 
the Xp — Y y scale, the psychologist has only to compute X P, yr and 
X.— Xy, He found X» to be 19.64, and Хур to be 37.58. Hence, 
P~ Xyp=— 17.94. 
Had Ё been expressed in 
en obtained by application of 


terms of the z-scale the value of z would have 
(8.4) as follows: 


307 


TESTING STATISTICAL HYPOTHESES 


З (Хь Хур) — (up — шх) _ (19.64 — 37.58) = (0) 
бу туь 3.14 
SOS aoe es 
a e 


Ѕтер 5. Decision. 

The psychologist now referred the value of the statistie (— 17.94) 
to R and found it to lie in R. Hence, he rejected the hypothesis that 
шь = иур = 0. The fact that the value of the statistic fell into the lower 
portion of R indicates further that the possibility up = ду > 0 may also 
be rejected, for the only acceptable explanations for a V = X yp difference 
of less than zero are chance or the fact that иь = uve <0. Chance is 
eliminated as an acceptable explanation when X» = Хур falls in R. In 
other words, if the obtained Xp— Хур is so far below zero as to warrant 
rejection of zero its value also warrants rejection of any hypothetical 
difference greater than zero, for it would be still further removed from any 
such hypothetical difference. Thus the only remaining possibility is that 
Mp = Мур <0. This, of course, means that learning occurred at a more 
rapid rate—that is, fewer trials were required under the punishment con- 
dition than under the no-punishment condition. 

Had the psychologist used z as the test statistic, he would have referred 
the obtained value of z (— 5.71) to R expressed in terms of z. In all respects 
the outcome is the same. 


10.18 Some POSSIBLE EXPLANATIONS OF THE RESULT Хь = Хур <0 


In this section we shall list some of the possible reasons for the occur- 
rence of an obtained value of X» — Хур of less than zero, that is, an ob- 
tained difference in favor of the punishment (Р) condition.* 


1. The particular set of individuals in the P-sample may have been more in- 
telligent, and hence more rapid learners, than those in the N P-sample. 

2. 'The particular set of individuals in the P-sample may have had more previ- 
ous experience with a learning task of the type involved than those in 
the N P-sample. 

3. The particular set of individuals in the P-sample may have been in better 
physical condition at the time of the experiment than those in the N7- 
sample. 

4. The experimenter may have unwittingly given the instructions to the sub- 
jects in such a way as to favor the P-condition. 

5. In scheduling the subjects the experimenter may have allotted more favor- 
able times to the members of the P-sample. (For example, the P-sample 
subjects may all have been scheduled for about 9 a.m.—a time of day when 


"It must be kept in mind that the criterion measure is such that small values indicate 
fast learning and large values slow learning. 


308 TESTING STATISTICAL HYPOTHESES 


all were mentally fresh and alert. The members of the NP-sample, on the 
other hand, may all have been scheduled for about 1:00 р.м.—а time of day 
when all were somewhat “logy” following noon lunch.) 

6. The room in which the P-condition was carried out may have beén more 
conducive to learning (e.g., it may have been more quiet) than that used for 
the VP-condition. 

7. The P-condition may have been more favorable to rapid learning than the 


N P-condition. 


Now, of these possible reasons, only Items 1, 2, and 3 may be eliminated 
аз a result of the outcome of the statistical test, that is, as a result of the 
obtained difference falling in R. These are each illustrative of reasons why 
individuals differ in their ability to learn a given task at a given time. 
Whether or not one or the other of the experimental groups has an ad- 
vantage as a result of any reason such as these depends entirely upon the 
operation of the chance or random factors which determine the selection 
of the particular individuals who comprise the particular samples studied. 
It is only reasons of this type that we eliminate when we reject statistical 
hypotheses. 

The psychologist upon rejecting H a 
the lower portion of R would, of course, like to be able to point to Item 7 
above as the explanation. Before he сап validly do this, however, he must 
be in a position to show that he has conducted his experiment in such a 
Way as to have avoided such possible explanations as are illustrated by 
Items 4, 5, and 6 above. For example, he must be able to state that the 
same set of instructions was used with bot h experimental groups, that the 
time schedule was equally favorable to both, and that the same room was 
used by both. This, of course, requires careful planning. Any factor which 
might operate to give one group an over-all advantage over the other must 
be either eliminated or allowed to affect both groups equally. Failure to 
Anticipate and take into account such factors has in the past voided much 
Costly experimental work. 3 : 

In concluding this section, we shall present a term which, up to this 
Point, we have not employed. We refer to the term significant, or preferably 
Statistically significant, as it is commonly applied to an observed difference. 

hen a statistical test leads to rejection of a hypothesis of no difference 
between corresponding parameters of two populations, the observed differ- 
ence is said to be significant. When such a hypothesis cannot be thus re- 
Jected the observed difference 15 said to be non-significant. Ihe term 
Significant thus used simply implies that the observed difference differs 
rom zero by an amount greater than can reasonably be explained in terms 
of random sampling fluetuation— that is, by an amount greater than can 
reasonably be explained by causes of the type represented by Items 1, 2, 
айа 3 of the above list. Thus used, significant 15 a technical term, the 
Meaning of which is not to be confused with that of the word significant 


s a result of Xp— Хур falling in 


309 


TESTING STATISTICAL HYPOTHESES 


as it is employed in common usage. ‘The student should be extremely careful 
in interpreting findings regarding significant differences not to infer that 
all such differences are necessarily of practical importance or consequence. 
Clearly, statistical significance is a necessary condition to the practical im- 
portance of any observed difference. No difference which is of insufficient 
magnitude to warrant the elimination of chance sampling fluctuations as а 
possible explanatory cause can conceivably be of any practical importance. 
On the other hand, statistical significance can in no sense be regarded as а 
sufficient condition of the practical importance of an observed difference. A 
difference between the values of corresponding parameters of two popula- 
tions may exist and, if investigated by a sufficiently powerful statistical 
test, may give rise to a statistically significant observed difference. Yet 
this real difference may not be sufficiently large to be of any practical im- 
portance in the real world. 

For example, a sufficiently powerful statistical test might conceivably 
enable us to demonstrate that an observed difference in the mean heights 
of samples of United States and Canadian adult males was statistically 
significant. Yet the real difference in the mean heights of the two popula- 
tions involved would almost certainly be so small as to be of no practical 
importance whatever to, say, clothing manufacturers, who in spite of the 
statistically significant difference can use the same distributions of clothes 
sizes for the two populations. On the other hand, an observed significant 
difference between the mean height of a sample of adult United States 
males and that of a sample of adult Japanese males would almost certainly 
relate to a real difference of some practical consequence to clothing manu- 
facturers seeking to supply both markets. It should also be obvious in this 
connection that a much less powerful test would be sufficient to demon- 


strate significance in the case of the latter comparison than in the case О 
the former. 


10.19 A PsYCHOLOGICAL PROBLEM: ExPERIMENT IT 


In view of the outcome of Experiment I (Section 10.17), the psy- 
chologist wondered whether the punishment of both successes and failures 
was any more effective in increasing the speed of learning than punishment 
of failures alone. He decided to conduct a second experiment along the 
same lines as the first except that the experimental conditions would now 
involve (1) punishment of both successes and failures (PB) and (2) punish- 
ment of failures only (PF). From the students who had not participated 
in the first experiment he selected two groups of 50 at random and assigned 
them respectively to the PB and PF conditions. The criterion scores һе 
obtained are shown in Table 10.5. 


Since, save for the change in experimental conditions, the circumstances 


310 TESTING STATISTICAL HYPOTHESES 


TABLE 10.5 Criterion Scores for Two Experimental Groups in 
Experiment IT on the Effect of Punishment on 
Speed of Learning 


PB CONDITION PF CONDITION 
25 19 20 16 23 24 24 23 15 
21 27 25 19 20 37 22 21 23 16 
15 12 25 18 13 24 26 11 29 10 
21 10 17 21 22 27 21 26 21 26 
27 26 24 15 17 17 24 23 14 20 
25 20 30 24 23 23 18 25 3l 21 
17 16 24 12 12 27 12 19 22 2‹ 
28 28 19 эз 22 18 22 23 25 7 
12 15 6 19 16 14 14 18 25 11 
22 21 16 22 12 22 26 25 20 35 
lu OR ae Е исе е 
SN =972 X 
19.44 22.14 
) 20,176. Ё 5. 
X)2/N = 18,895.68 : 508.6 
r 1,280.32 : 5.02 
وه ا‎ J ا اه‎ 


2xperiment I, we shall simply outline the test of the 


are the same as those of 1 
{ further comment. 


null hypothesis involved withou 
тер 1, H: ppp— Mer = 0 


Alternatives: (1) ure — upr > 0 
(2) urn — Mer <0 


Smer 2. a= .001, as before. 


тер 3. R: (in terms of 2) 


09 and 2= — 3.29 


Step 4, Computation of z for the sample at hand. 


(Zoe — Xe) = (uen — per) 


Ü&pp-Xpr 


z= 


Step 5. Retain hypothesis. (Why?) 


311 


TBS 
ESTING STATISTICAL HYPOTHESES 


10.20 REPORTING THE EXTREME AREA 


The outcome of Experiment IT was inconclusive. The evidence did 
not justify the rejection of the hypothesis at the selected. level of sig- 
nificance zo that the psychologist must retain the possibility wes = Ker = 0 
in the list of tenable possibilities. Yet the fact remains that in the particular 
samples studied the PB condition was superior.* In fact, assuming the 
hypothesis to be true, the probability of a value for NX pay = X pp as large as 
the one obtained is rather small. This probability is represented by the 
area at the extremes of the normal distribution, that is, by the combined 
segments of area below and above z = — 2.45 and z = + 2.15. Using Table 
II, Appendix C, we may find this extreme area as follows: 


ЕА = Р(2= —245| ND: р= 0, с= 1) 
+ Р(22= + 2.45 | ND: р= 0, с= 1) 
= .0071 + .0071 
= .0142 


This EA corresponds to the smallest value of œ which could have been 
chosen and yet lead to a decision to reject I, given the particular collection 
of data at hand. This probability value is often included as part of the 
published findings of rescarch investigations. The practice of reporting 
this value serves as a convenience for those readers who may disagree with 
the researchers’ arbitrary choice of a level of significance (о), and who соп 
sequently wish to know what the outcome of the test would be had some 
other value of o been selected. Thus, a reader who feels that an а of 05 
would have been appropriate in this experiment would in his own mind 
arrive at a decision to reject the hypothesis—a decision different from that 
made by the experimenter. The decision rule stated with reference to апу 
arbitrarily selected а and its relation to EA is simply: 


Reject if EA = a 
Retain if £A > « 


It is important that the student realize that under no cireumstances 
can a researcher properly delay the choice of œ until the ZA has been 
determined. The degree of control to be exercised over a Type I error, while 
a matter of subjective judgment, ought always to be established with 
complete independence of the outcome of the statistical нчи That is 10 
say, the outcome of the test should in no way whatever influence the 
selection of a. In the theory of testing statistical hypotheses the level 9 
significance, о, is an arbitrarily selected constant and not a variable. It 
should never be eonfused with the EA-value, which, of aor varies from 
sample to sample, and this E A-value should, in жн бузы be referred to 


*Keep in mind that the fewer the trials necessary 


faster the learning. to reach the learning criterion, n 


312 


TESTING STATISTICAL HYPOTHESES 


as a level of significance. No information is ever gained as a result of con- 
ducting an experiment that provides any additional basis for the selection 
of an a-value, all information bearing on this selection being available prior 
to the actual analysis of any particular collection of data. It is for this 
reason that the selection of œ has been established as а second step in the 
procedure. Coming as it thus does, prior to the collection and analysis of 
the data, the temptation to manipulate @ to fit the findings is removed. 

It is important that small EA-values be interpreted with some degree 
of caution. A small ЕА is, of course, associated with a large absolute value 
of the test statistic 2. While such a z implies a small likelihood of a Type I 
error in rejecting H, it does not necessarily also imply that the diserepaney 
between H and the observed value of the corresponding statistic (S) is of 
ance in the real world. A large absolute value of z may 
result from a small difference between S and H provided Gs is very small, 
that is, provided the test is very powerful. It is indeed tempting to inter- 
pret large absolute z-values or small EA-values as implying а difference of 
great practical importance between S and Н. Actually, such an interpreta- 


tion may be quite invalid 


any practical import 


10.21 A PsycHOLOGICAL PROBLEM: EXPERIMENT III 


In considering the outcome of Experiment II (a difference of 2.7 in 
favor of the mean of the PB condition, with which an EA of ‚0142 was 
associated), the psychologist wondered if perhaps the mor ege had 
resulted in a Type II error. That is, he wondered if perhaps Un PB condi- 
tion was actually more effective in increasing the speed of learning than the 
PF condition. He realized that such an error might have occurred as a 
sampling accident. Such an accident would result if, in spite of ci gc 
Or average superiority of the PB condition for the population, ome particular 
РВ sample just happened to contain an unusual number of indiv iduals sie 
under any condition were inferior as learners to those in the partieular PF 
sample. He realized further, that the probability of such a chance occur- 
rence could be reduced by increasing the power of the statistical test. He 
decided, therefore, to run Experiment П a second time NE = employ, in 
so doing, a variation in the experimental design vid would reduce the 
Standard error and thus increase the power of the test.T | | | 

The variation which the psychologist decided to employ consisted in 
an attempt to control one of the possible causes of difference between the 
ee 


"In this connection the student is advised to reread the concluding remarks of Section 
"onnectic T 
0.18 


lished this within the framework of the 


ive accomp 3 
numbers of cases in the samples. See 


Tic Should be noted that he could hz ‹ 
easing the 


design previously used by simply iner 
Pp. 291-293. 


313 


TESTING STATISTICAL HYPOTHESES 


two sets of individuals who would comprise the particular samples to be 
studied. The particular cause which he elected to control, and which in the 
previous experimental design was present as a chance or random factor, 
was the intelligence of the subjects (see Item 1, Section 10.18). In theory 
this would have to be accomplished by some process such as the following: 
Step 1. Select an individual at random from the population and obtain for 
him a measure of the amount of the control variable (intelligence in 
this example) he possesses. 
Step 2. From among the subset of individuals in the population who possess 
this same measured amount of the control variable, select one at 
random and pair him with the individual selected in Step 1. 
Step 3. Repeat Steps 1 and 2 until the desired number of pairs of individuals 
is obtained. 
Step 4. By a random process (e.g., the toss of a coin) assign the members of 
the pairs to the two experimental groups. 


The student will at once recognize the practical impossibility of carry- 
ing out this process in a real situation. In the first place, in a real situation 
the total pool of individuals available for experimental purposes (e.g. 
college sophomores enrolled in an introductory psychology course) is not 
usually the population to which it is desired to generalize, but is rather by 
assumption a random sample from this population. Thus in Step 1, the 
individual referred to is selected at random from a sample rather than from 
the population. If this sample is, as assumed, a random sample from the 
population we may, without too much violence to the process described, 
assume practical compliance with Step 1. The principal difficulty arises in 
Step 2. The population subset referred to in this step is, of course, not 
available. The experimenter may be able to identify a sample subset from 
which a matching subject may be randomly selected. It may be, however, 
that there is no individual in the sample pool having the same measured 
amount of the control variable as the individual selected in Step 1. This is 
particularly likely to occur when the available sample pool is not very large: 
When this situation arises there is no way to effect Step 2. The experi- 
menter can, of course, either discard the subject initially selected and start 
over or select some subject who matches him approximately. The former 
alternative is to be preferred over the latter. If he proceeds according LO 
this first alternative he is in effect selecting matching pairs at random from 
the matching pairs in the sample pool. Only if the matching pairs in the 
sample pool may reasonably be assumed to be in turn a random sample of 
such pairs as they exist in the population ean the conditions necessary (0 
the use of a control variable be regarded as having been satisfied. 

Although we shall proceed with our example assuming the conditions 
necessary for the analysis to be satisfied, it is important for the student to 
recognize that this is not likely to be the case in many real situations. The 
method of analyzing the data which we shall present is, nevertheless, ® 


31 4 TESTING STATISTICAL HYPOTHESES 


most important one, for it is the appropriate procedure to follow in situa- 
tions in which the two experimental treatments may both be applied to 
the same individual* or in which before- and after-treatment scores are to 
be compared for a given sample of individuals. In such situations the 
individual subjeet is, of course, paired or matched with himself, and the 
problems associated with obtaining a random set of accurately matched 
pairs do not exist. It is in situations of this type that the student will find 
the statistical test about to be described to be most useful. 

In our illustrative example, let us say that the psychologist decided to 
delay running the experiment until the fall semester of the following year 
so that an entirely new class of students in introductory psychology would 
be" available from which he could select his sample of matched pairs. 
Shortly after the opening of this term he administered an intelligence test 
to all these students. He then selected at random a single student from 
among them. Next, he selected at random a single student from among 
the subgroup of students whose scores on this test were the same as that 
of the first student selected (we are, of course, assuming the existence of 
such a subgroup in the sample pool—i.e., the psychology class). These two 
students, matched or equated in intelligence as measured by their per- 
formance on the test, became the first pair of subjects selected. The 
Psychologist repeated this procedure until he had in all selected 50 pairs of 
subjects matched on the basis of their intelligence-test scores. He then 
randomly assigned one member of each pair to the PB condition and the 


other to the PF condition. 


The criterion scores for each pair of subjects on the same learning task 


as was used in the preceding experiment, together with the differences 
(D = X5, — Xpr) between these scores for each pair, are shown in Table 
10.6. It is clear that had the psychologist picked both members of each pair 
Purely at random without equating them, the expected variability of the 
D-values would be greater than that of D-values based on pairs equated 
With reference to some factor causing variation. This follows from the fact 
that one of the factors causing variation in the D-values derived from purely 
random pairs is eliminated by the equating process from the D-values 
derived from matched pairs. Consequently, the standard oes of the 
Sampling distribution of the means of samples of D-values "mn from 
*quated pairs must be smaller than that of the sampling distri ш of 
Means of samples of D-values derived from purely er nad | dins a 
test of the hypothesis that the mean of a population o iq азо 
More powerful when ће D-values аге derived from equated pairs than 
When re deri random pairs. . 
is skin opi ereach in power may be ac ans by equating 
€pends upon the extent to which the equating factor contributes to varia- 


either one of the experimental treat- 


ation of Bes р 
En he administration of the other. 


= - 
This, of course, implies that the ШАН outcome of t 


ments to a subject has no effect upon t 


315 


TESTING STATISTICAL HYPOTHESES 


TABLE 1 0.6 Criterion Scores and Differences Between Them for 
Two Matched Groups in Experiment III on the 
Effect of Punishment on Speed of Learning 


PB PW D PB PW D PB PW p] 
23 26 = ol Be = 9 is 26 2 
i 16 +2 10 24 -и 24 22 42 
ov 21 +6 25 25 0 93. И vd 
30 25 +5 24 24 0 21 21 0 
15. 4] =9 1 20 —8 09- ^02 0 
16 14 +2 31 25 +6 б рр Sy 
o] nar res 20 24 —4 l5 3» эў 
24 25 2S 18 14 +4 90: ЭЖ x8 
19 24 —5 24 21 +3 22. 18^ E 
22 26 —4 u- 90 -9 ot Яй 29 
20 15 +5 25. 87 —6 
20 28 -8 OF 9] =2 E(-D) =- 185 
12 18 =] Sf Эз eso Z(+ р) = +57 
1 20 4 19 15 +4 Хр = — 128 
30 24 +6 E —7 D=- 2.56 

DD? = 1,796 
13 21 —8 27 30 —3 ED)?/N 2 327.08 
23 28 0 16 23 —7 Dd? = 1,468.32 
ea 2 X жск 
8p = 5.42 


17 25 


tion in the performances of individual subjects on the experimental task. 
If this factor has little to do with individual variation in performance on 
this task, the effect of equating upon the variability of D-values will be 
slight. That is, there will be little difference between the variability of D- 
values derived from equated pairs and that of D-values derived from purely 
random pairs. On the other hand, if the equating factor is one of the major 
factors contributing to individual differences in performance on the experi- 
mental task, the variability of D-values derived from equated pairs will be 
considerably smaller than that of D-values derived from purely random 
pairs. It is important, therefore, if an increase in power is to be achieved, 
that the factor with reference to which the members of the pairs are equated 
be one that makes an appreciable contribution to individual differences 
in performance on the experimental task. Unless this is the сазе there is 
little to be gained through application of this equating procedure. 
Understanding of the experimental design under consideration requires 
further that the student appreciate the fact that the mean of a population 
of D-values is the same as the difference between the means of the two 


316 TESTING STATISTICAL HYPOTHESES 


populations of X-values which form the pairs of scores. Symbolically stated 
in terms of our example, 
Шр = pps — ИРЕ* 
where D= Xps — Xpr 
Hence, whether we test a hypothesis about ирв — Mer as we did in Experi- 
ment II, or about ир as we now propose to do, we are actually testing a 
hypothesis about the same value. In other words, testing the hypothesis 
that up = 0 is the equivalent of testing the hypothesis that urs — Ире = 0. 
We shall now present, step by step, the procedure followed by the 
Psychologist in testing this hypothesis using the data of Experiment III 
(see Table 10.6). 
STEP 1. H: up =0 
Alternatives: (1) up» 0 
(2) up «0 
Srer 2, о = .001, as before. 


Srep 3. R: (in terms of 2) 
zz + 3.29 and z = — 3.29. 


бтр 4. Computation of z for the sample of D-values at hand.} 


D— ыр 
2 5 
Applying (9.25) we obtain 
82 542 774 


957 JN-1 V50—1 


(Note: N here is the number of D-values, i.e., the number of pairs.) 


STEP 5, Reject the hypothesis. (Why?) 
pee Ee Mop per 


“г 
The Proof of this 
= n individuals cor 
r each of т individuals, we have 


statement follows as an application of Rule 5.2. In Rule 5.2, let 

ОЙЫ tsm pairs, and let т = 2. Then, instead of having m scores 

SA two scores for each of n pairs of individuals. We 
Then, by Rule 5.2, 


Shall represent these two scores by Хз and Xs, respectively. 
Gc pt е (1) 
y= Х, + Х2 e 
Where : En Xu + Хә sum of scores for pair ? 
=) 2 


Xi = mean of X values, and 


alues 
constant factor — 1. 


" Xs 
Now let each Xp-value be multiplied by the 
re given by 


„ = mean of X2 s y 
ш Then the new S-values 


" the Mya kat Di 
Semi CaS ee : m 
But by Rule 5.4, th s of. е du ea thus modified is — ] times the original mean, 
OF > 5.4, the med Ar 
ү, - Substituting in (1) we have 


D2Xi-Xs 


See Table 10.6 for basic computations. 


317 


ESTING STATISTICAL HYPOTHESES 


)8-.954[) 8—.05 


“© ж. 
ч о оло 


Power P-Scale 


о Rw & 


= 


o 


DI 3 boh T | 
-7 =6 —5 —4-3 =2 = 0 +1 +2 43 4445 +6 47 
Scale of Possible Values of A=up = tpg pF 


Ficure 10.13 Power curves of tests used in Psychological 
Experiments II and III 


It is important to note that since the obtained z-value falls in the lower 
portion of R, the rejection of up = 0 implies also rejection of any “p-value 
greater than zero. Hence, the only remaining possibility is that ир <0, 
and the psychologist is able to report the finding that the PB condition 15 


TABLE 10.7 
Values of P Correspond- 0 x 001 pi 
= Е + 0.5 .002 004 
ing to Differences (A) Ве- +10 009 023 
tween H and Selected Pos- #15 027 087 
stble Alternative Values of +2.0 n 239 
Шр = ups — Mer for Psy- + 2.5 154 476 
chological Experiments II + 3.0 .288 719 
and III + 3.5 456 891 
+ 4.0 .637 969 
+ 4.5 .788 994 
+ 5.0 894 999 
+ 5.5 956 
+6.0 984 
+ 6.5 996 
+ 7.0 999 اس‎ 


318 


TESTING STATISTICAL HYPOTH ESES 


more effective in increasing the speed of learning the experimental task 
than the PF condition. 

We thus have an example showing how the power of a statistical test 
may be improved without increasing sample size by means of an experi- 
mental design involving equated groups. The designs of Experiments II 
and III are equally effective insofar as control over a Type I error is con- 
cerned, but the latter is superior in the degree of control exercised over a 
Type II error. The power curves for these two tests are shown in Figure 
10.13. The P-values plotted are given in Table 10.7.* It may, for example, 
be seen from Figure 10.13 that for P = .95 (i.e., for 8 = probability of a 
Type II error = .05) the discrepancy between H and up would have to be 
5.42 trials in the case of Experiment II as compared with only 3.82 trials 
in the case of Experiment III. 


10.22 A PROBLEM INVOLVING THE COMPARISON OF Two Proportions 


An investigator was interested in comparing the educational achieve- 
ment of present-day high school students with that of the high school 
students of twenty to twenty-five years ago.t He located certain achieve- 


ment tests which had been used in certain high schools twenty to twenty- 


five years ago and for which results were still available. He repeated these 


tests with students currently enrolled in these same schools. | 
One of the tests thus repeated was а proofreading test of English Cor- 
One of the sentences in the test copy read: 


Tectness originally given in 1931. 
ono ph has been the study of the old ways 


К | 
In my own case my greatest trium 


of working mettle.” 
The investigator discovered that in a random sample of 1,000 students 


taking this test in 1931, .36 had detected and properly corrected the spelling 
error involved. He further found that in a random sample of 500 students 
taking this same test in 1954, .54 detected and properly corrected this 
Particular error. He wished to determine whether or not the difference in 
these two proportions was larger than could reasonably be attributed to 
random sampling fluctuation. To accomplish this he tested the statistical 
hypothesis that the proportions for the two populations represented were 
the same. The procedure used and results obtained were as follows. 


Step 1. Hz ju фа Ot 


А -— 0 
Alternatives: (1) $1 oss > 
(2) фи — %54 <0 


ی د کک 


эү; 
ү ы аваа that the stude 

bh R. Sligo, Comparison 

үп, 1954, Unpublished doctora à 
1e subscripts 31 and 54 identify the 


ifv some of these values. . | 
ү een in Selected High School Subjects in 1934 
Į dissertation, State University of Iowa, 1955. 


1931 and 1954 groups. 


319 


TESTI NG STATISTICAL HYPOTH ESES 


Srer 2. a=.01 

In justifying this choice the investigator wrote: “It was felt that the 
mistake of retaining an hypothesis of no difference between then and now 
populations when such a difference actually exists would be of less serious 
consequence than the conyerse error of rejecting such an hypothesis when 
it was actually true. Hence, to guard against the type of error felt to be 
the more serious, a .01 value was chosen as the critical level of significance.” 
Srep 3. R: z= — 2.58 and z = + 2.58 
Srep 4. Computation of z. 


Here z (ра ры) — (фи — фы) 


Opa = Ры 


The standard error of the difference between two proportions may be 
estimated by means of (9.29). In this particular situation, however, the 
two population proportions involved are hypothesized to be equal. Hence, 
if the data are to be analyzed in a manner consistent with the hypothesis, 
the same value should be used for both pı and р» in formula (9.29). 

The value to be so used should be the best possible estimate of the pro- 
portion hypothesized to be common to both populations that ean be de- 
rived from the data at hand. This estimate is simply the proportion for 
both samples considered as one. Since ps1 and ps4 are means (sce footnote 
раве 253) the simplest way to obtain р for both samples combined is by 
application of (5.4). As it applies to the problem at hand (5.4) may be 
written: 

p= ngipsi + T54pas4 
па + N54 
а pe (1000) (.36) + (500) (.54) _ 42 
1000 + 500 Mu 
Now application of (9.29) gives 


ё„_„ = |220 42), 420 — 42) 
dh 1000-1 + a 1 
= V.0002438 + .0004882 
= V.0007320 
= .027 
pq (36.54) = (0) 
027 nini id 


Srep 5. Reject hypothesis. (Why?) 


| Note that in this situation rejection of $3; — $54 = 0 also implies ro 
tion of the alternative possibility that Фз — $54 > 0 (Why?) Hence, the 
investigator concluded that $3; — $54 < 0, that is, that the proportion of 


success on this particular test item was i А 
А 2 ^ as greater i +195 » (han 
in the 1931 population. n the 1954 population 


320 


TESTING STATISTICAL HYPOTHES? 


INTERVAL ESTIMATION 


11.1 INTRODUCTION 


chapter we have considered the problem of tests ap- 
rmining whether or not certain logical a 
enable as values of certain popula- 


| In the preceding 
ак for the purpose of dete 
* iori values, called hypotheses, were t 
LA eee The whole procedure was based on the premise that such 
sa a priori values did exist. Occasions arise in which information 
fel ae the magnitude of some population parameter is of great interest 
MS in which no logical a. prior? notion exists regarding its possible 
bays a such situations there ean be no hypotheses to test and the problem 
ab 105 simply опе of making the most informative statement possible 
out the magnitude of the parameter by studying a sample. Such prob- 
atistical estimation. 
Т are по logical hypotheses to test are not 
'€ only situations in which statistical estimation plays the major role. In 
n the most important. Consider, for example, a 
"oy (H) regarding a parameter (8) has been 
s and rejeeted. Let us suppose that the value of the statistic (S) was 
Siderably greater than H so that rejection implies elimination of the 
Possibilities (1) 0 = H, and (2)0 < H. This leaves us with the knowledge 


the 
hat 0» IJ, provided, of course, that an error has not occurred. But the 
an some value H may not satisfy our need for 


Si 
шь fact that 0 is greater the 
'owledge about 0. In fact, it may represent only a crude preliminary 


ems ^ f 
Ms are problems of st 
But situations in which there 


fac 
sii 5 they are not eve 
uation in which a hypothesis 


321 


INTERV 
ERVAL ESTIMATION 


first stage in the development of some theory. As this development pro- 
gresses toward refinement, the critical issue may well be not that 0> 
but rather precisely how much greater. In other words, the determination 0 
the fact that 0 > H may represent only a crude preliminary first step ina 
search for knowledge about 0. A natural second step consists of making 
the best possible estimate of the magnitude of 0 from the information con- 
tained in a sample. In this chapter we shall be concerned with the problem 
of making such estimates. Although a comprehensive attack upon this 
problem is beyond the scope of this book, the student should not minimize 
its great importance. In fact, the more refined the theories with which he 
seeks to deal, the more important will the issues involved in statistical 
estimation become. 

There are two approaches to the problem of estimating the magnitude 
of some population parameter, 0, from the information contained in & 
sample: (1) the point or single-valued approach, and (2) the interval ог 
range-of-values approach. The first approach yields a single value which 
according to some criterion or criteria is the "best" estimate that can be 
made from the information contained in the sample. Since the selection of 
a criterion—that is, of a definition of "best"—is arbitrary, and since a 
number of possibilities exist, there are a variety of ways in which point or 
single-valued estimates of Ó may be obtained from a sample. Some of these 
ways lead to the use of S, the sample fact (statistic) corresponding to 0, 
others do not. Some indicate the use of S in the case of some parameters 
and not in the case of others. The theory of point estimation is extensive 
and is, to a large degree, based on fairly advanced mathematical concepts: 
Treatment of this theory is, therefore, beyond the scope of this book. In 
those situations in which we may find it necessary to employ point esti- 
mates, we shall be content simply to accept and apply the theorists’ find- 
ings. This we have already done, for example, in estimating the standard 
errors of the various sampling distributions we have used in testing statisti- 
cal hypotheses. Rules 9.9 and 9.10 and formulas (9.15) through (9.29) all 
provide point estimates of certain population parameters. 


The second approach involves the determination of an interval, oF 
range of values, within which the “true” 


fall. Such intervals may be prescribed 
upper limits or bounds. 


or population value is presumed to 
simply in terms of their lower and 


TA t Thus we might present the values 90 and 95 as 
the limits of an interval presumed to contain the value of the mean, и, of 


some population. In presenting such limits, we are in effect, saying that, 
according to the information contained in a particular sample, u is probably 
some value in the interval 90 to 95. This approach has the advantage not 


only of implying the fact that estimation is involved, but also, through the 
width of the interval, of providing some in A 


estimation. For example, to present the i 


dication of the accuracy of the 
of u suggests a less accurate estimate th 


nterval 85 to 100 as an estimate 
an 1s provided by the interval 90 to 


322 


INTERVAL ESTIMATION 


95. Interval estimates are at a disadvantage—in fact, cannot be used— 
when the estimate is required for use in subsequent calculation. For 
example, an estimate of the population standard deviation is needed in 
order to estimate the standard error of the mean, which in turn is used in 
estimating the value of the test statistic z that is referred to a critical 
region R in testing a hypothesis about the mean of some population. The 
theory of testing statistical hypotheses required that all of these estimates 
(0, 8x, and z) be single-valued. However, in all situations in which single- 
valued estimates are not thus required, interval estimates are to be pre- 


ferred. In this chapter we shall consider only the technique of interval 


estimation. 


112. INTRODUCTION TO THE CONCEPT OF A CONFIDENCE INTERVAL 


Let 0 represent the value (unknown to us) of some population param- 


eter which we wish to estimate. That is, we wish to determine from the 


information contained in a sample an estimate of 0. We shall use the 
t determine lower and upper 


interval approach. This implies that we mus 
bounds or limits in such a way that we can be reasonably confident that 0 


lies between them. Before this can be accomplished it is necessary to indi- 
cate more precisely what is meant by "reasonably confident." | 

In the first place, it may be helpful to note that it would be a simple 
matter to specify the limits of an interval which would be absolutely certain 
to contain б. All we need do is write — « and + < for the lower and upper 
limits respectively. Of course, such an interval is of no use whatever as an 
estimate. It is like describing the location of New York City as "some- 
where in the universe." Such statements may obviously be made without 
collecting any information at all. We have available for use the information 
contained in our sample, and we would certainly be willing to sacrifice some 
degree of certainty to secure an estimate that would be of some practical 
value. Such sacrifice of some degree of certainty should always Be: accom- 
Panied by a fairly precise indication either of the extent of е or of 
the degree of certainty which remains after the sacrifice ‘on tine It 
is usually customary to follow the latter prantice tiat OR ying thie 
degree of certainty which remains. In the dinn Ы hic a ows, в 
ever, the term confidence will be used in lieu of the phrase “degree of cer- 


tainty.” Ж 
f 0 from a given random sample, we 
of uncertain outcome in the sense that the par- 
her does ог does not include the value 0. If the 
ire were to be repeated a second time, the 


3 . corval limits would almost certainly 
Sample values 3 ntly the interva. ) r 
' values and consequent y E : i LOU 
ег to s tent from those previously obtained owing to the operation 
of chanc я T p factors, and again the interval either would or would 
ance or random 14—00", © 


In deriving an interval estimate o! 


are dealing with an event 
ticular interval obtained eit 
Sampling and estimating procedu 


323 


INTE 
NTERVAL ESTIMATION 


not contain 0. If, through repetition of the sampling and estimating pro- 
cedures, a "large" number of intervals were obtained, a certain proportion 
of them would contain 0 and a certain proportion would not. As а quanti- 
tative index of our confidence that an interval contains 0 we shall use the 
relative frequency (probability) with which intervals containing 0 occur 
in the theoretical universe of such intervals that would arise from an infinity 
of repetitions of the sampling and estimating procedures. 

Suppose that in such a universe of intervals, .95 contained 0. We shall 
refer to any particular one of these intervals as a 95 per cent confidence 
interval. This does not mean that the probability that this particular 
interval contains 0 is .95. Either this particular interval contains @ or it 
does not. However, we do know that for the infinite universe of intervals 
derived by repeated application of the same procedure which led to the 
particular interval at hand, the probability of interv: 
ш d HEN а of such a nature as to yield intervals 95 
Ышайа be е à portant that Ше student recognize that the 

E y rpreted as a probability only with reference to the 
theoretical universe of all such intervals. 

In a practical sense, then, our problem is one 
for deriving interval limits from the informatio 
een Ti aa m estimating procedure which, if repeated in- 

ould lea iversi i vale ; ; 

БАЛЫ кз dei lid cg шо, some arbitrarily selected 
шай. отн ee alue of the parameter (0) being 

S common practice to use either .95 or .99 as the arbitrarily 
selected proportion, though other values may, of ¢ eee ena So 
that our discussion may be presented an Ў н Bs i ourse, be selected. © 
sent this arbitrarily selected еа к terms we shall let y repre- 
AE only ath DM tion, hile Y may be interpreted as 2 
tude certainly influences dto ox : ce to a universe of intervals, its magni- 
dins : nfidence we feel that a given interval con- 
tains 0. If, about each interval in the universe. w ی و‎ he 
statement “this interval contains 8," we w ied we were to make Ше 
of the time. Clearly, the more frequent! OW | зе correct 100y per cen 
more confident we feel about them.* It is n иеше е гы ts 
estimates are referred to as 100ү per Е d тео шы иы, 
value y referred to as a confidence Fors SAQUE quU, aki 

We have already pointed out how ab : | | 
trivial interval estimate extending fr solute certainty (y = 1) leads to ® 
should be fairly obvious that the lar ate negative to positive infinity. It 
intervals will be. On the otlier БЫ m value of y the wider the resulting 
resulting in narrower intervals, за е of a small value of y, while 
given interval contains 0, since iiis a m we lack confidence that any 
of the intervals in the universe of s di that only a small proportion (т) 

such intervals actually contain б. The 


als containing @ is .95. 


of prescribing a procedure 
n contained in a random 


*At this point the student may find it profitable t 
ә able to rereg 


ad Section 8.6, 


324 


INTERVAL ESTIMATION 


beer at шп, represents an arbitrary compromise between the degree 
2 S ч А um to be confident of the interval containing 0 and the degree 
VR ie we it to “pin down ' our estimate of 0 to a narrow or limited 
orat Jide je values. З While we naturally wish to “pin down" our 
ah end much as possible, there is no point in doing so at the sacrifice 
ee E: к degree of confidence that the resulting intervals 
т Be A : A W e have previously indicated, y is usually taken to be either 

° 99. Occasional use has also been made of the value of .90 and even 
of the value .50, but as a general rule the selection of any value less than .90 
А to be accompanied by special justification. As one might intuitively 
D : ^ it is always possible to improve estimates by collecting more in- 
f mation, that is, by increasing the size of the sample. Obviously, if 
DUM estimates for y-values of .99 or .95 are too wide to suit our ROE 
we should seek to narrow them by collecting more information rather Шап 


by further reducing the value of y. 
11.3 Derinirion or A 1007 PER CENT CONFIDENCE INTERVAL 
imple from some population. 


ation parameter to be estimated; 
1007 per cent confidence interval; 


T 
Given a random s 


Let 0= the value of the popul 
6 = the lower limit of the 

r limit of the 1007 per cent confidence interval; 

ding to 0*; 

S for the given sample. 


0 = the uppe 
S = the statistic correspon 
Sı = the particular value of 


Figure я NN 
EM 11.1 Sampling distribu- 
| on of a statistic, S, corresponding 
0 a parameter, 0 


S-Scale 


g distribution of S is as shown in Figure 
be a 1007 per cent confidence interval 


as follows: 


1 Now suppose that the samplin, 
ee Then @ and @ which prescri 
or the given sample may be defined 

g=Si—4 (11.12) 


6=Site (11.1b) 


Sa M сас 


* 

хаса], S should be some 

Ars ев are not the sample count 
* treated in this text. 


stimate of б. While some such point esti- 


"best" point € 1 
he parameter involved, no such situations 


erparts of t 


325 


1 М 
NTERVAL ESTIMATION 


where d and с are distances as defined by Figure 11.1. Le., c and d are 
distances such that the probability of S in the range extending from a point, 
which is a distance c below 0 (point C) to a point which is a distance d above 
0 (point D) is y. 

Figure 11.1 shows that, in the universe of such intervals, 1007 per cent 
of them contain б. This follows from the facts: (1) that for every sample 
yielding an S-value in the range bounded by C and D, the interval as 
defined by (11.1) will contain 6, while for every sample yielding an S-value 
not in this range the interval as defined will not contain б (see Figure 11.1), 
and (2) that the probability of samples which yield an S-value in the range 
bounded by C and D is y. , 

The interval bounded by @ and 0 is а 1007 per cent confidence interval 
since in the universe of all such intervals 1007 per cent of them contain б. 

Figure 11.2 is intended to help the student grasp fact (1) above. This 
figure shows how the placement of a 1004 per cent confidence interval 


0—9 —> Ss جع‎ 9 


c 0 D 


0 -«— —d— ——-S34«——c——»- [Л 


c D 


8 -———d— —»-53«—c—» 6 
€ 0 D 


9 <——4—>5<—с——8 
с 6 D 
Ficure 11.2 Scales showing placement of 100%ү per cent confidence inter- 
vals for different obtained values of S 


varies in the case of four imaginary samples yielding for S the values repre- 
sented by Si, S», Ss, and S4. The four scales shown are like that of Figure 
11.1. The values Sı and S2 both fall in the range bounded by С and D and 
the 0, 6 intervals are seen to contain 0. The values S3 and S4 fall outside 
of this range and consequently the 0, 0 intervals do not contain б. 

It is important to note that (11.1) does not define a unique range from 
C to D for the given value of y, because it is possible to select different set$ 
of c and d distances each of which establishes a range of values (C to D) oP 
the S-scale such that the probability of S in this range is y. The c and d 
distances shown in Figure 11.1 could, for example, be varied b making ап 
increase in the length of c and a compensating decrease in ihe len th of d- 
The best selection of the c and d distances for a given value of а is that 


326 


INTERVAL ESTIMATION 


which results in the shortest distance from C to D. In some situations this 
criterion may prove difficult to apply. However, if the distribution of the 
statistic (S) is symmetrical, then the best selection simply consists in 
making the c and d distances equal. In the case of a symmetrical distribu- 
tion, this, of course, amounts to determining the с and d distances in such a 
way as to make the proportion of the distributi w C equal that above 
D. In fact, the practice of making the proportions i 
C and above D equal is, because of its convenience, very commonly us 
in the case of asymmetric distributions in spite of the fact that it may not 
result in the best values for c and d. The situations considered in this text 
are limited to those in which the sampling distributions are symmetrical. 
Consequently, the values we shall obtain for c and d by making the propor- 
tions below С and above D equal are best values. 

The use of (11.1) to determine б and 0 obviously requires that we be 
able to determine c and d. This implies that we must know the form of the 
sampling distribution and also that the distances c and d be independent 
not only of 0 but of any other parameters which may control the form of 
the sampling distribution. When c and d are functions of 0 or other 
parameters, either a different technique must be employed* or we must be 
content with a procedure which leads to intervals that are only approxi- 
mately 100* per cent confidence intervals—that is, toa universe of inier- 
vals in which the proportion containing 0 is only approximately y. We 
shall use (11.1) even when ¢ and d are functions of 0 or other parameters, 
When the samples are large, such application of (11.1) is quem mid 
for practical purposes—at least when applied to the putes Br mabin 
Which are treated in this text. In the following sections we will show how 
values of ¢ and d may be determined to provide аш N 1007 per 
cent, confidence intervals for а population meee (и), а popu co = 
(8), a population proportion (Ф), and the difference between two popula- 


tion means (A = ду — U2). 


11.4 Tue 1007 Per Cent CONFIDENCE INTERVAL FOR A 
| | PopuLaTioN MEAN 


7 cases from some population. 
andom sample of N cases frc poy 


Given a large г 

^ = ation mean; | | 

з ae nen of the 1007 per cent confidence interval; 
и = the lo 7 ў 

T = the upper limit of the 1007 


X any such sample; 
X = the mean of any such | | 
X, = the mean of the particular : sample at hand; | 
c = Ше standard error of the X sampling distribution. 
к= 


ieee 


же س‎ 
Such a technique is available b 


per cent confidence interval; 


ut is beyond the scope of this text. 


327 


INTE 
NTERVAL ESTIMATION 


Then by Rule 9.2 we know the sampling distribution of X to be ap- 
proximately a normal distribution. This approximate sampling distribu- 
tion is pictured in Figure 11.3. Two facts are apparent in this situation. 


C———c— p <—d—> 0 
X-Scale 


Ficure 11.3 Location of C and D in the case of the approxi- 
mate sampling distribution of X for large random samples 


First, since the distribution is symmetrical, the distances c and d should be 
made equal. Second, these distances are determined entirely by the choice 
of y and by the magnitude of ox and, hence, are independent of £ To 
determine c (or d) we need only (1) to refer to Table IL, Appendix С, to 
obtain the value of z such that the probability of a z in the range bounded 
by C and u is 7/2, and (2) to transform this z-value into units of the X7 
scale by simply finding the product of this z-value and gg. lf we let 27/2 
represent; this z-value, we may write the following formulas for the limits 
of a 100y per cent confidence interval for the sample at hand: 


B= Ў сўзу (11.2а) 
а= Xi + ozzy (11.2b) 
Before we can use (11.2) to determine and д we need to know the value 


of ox. If the sample is large, we can obtain a satisfactory approximation 
by using (9.25) to obtain an estimate 


) of as based on the sample. This 
results in the following formulas: ` 


a Yi- Bgm Yi 8 — ays (11.82) 
U= Ki + бугу» = F1 + A = 2үү2 (11.3) 
=l 


d The values of u and д as preseribed by (11.3) are the limits of а СОЛ” 
ps interval for which the confidence coefficient, is only approximately 
Y р approximate character of this interval is due (1) to the fact that the 

е i gia of X only tends toward a normal distribution a5 ' 
becomes large, and (2 э use of г à : "or 
ge, and (2) to the use of an estimate for the value of og- For 


328 


N 
INTERVAL ESTIMATIO 


remarks on the size sample necessary to justify the use of (11.3) the student 
is referred to the concluding paragraph of Section 9.4. Illustrations of the 


application of 11.3 follow. 


Example 1. Using the data of Solution I of the problem of the principal 
and the superintendent, determine the 99 per cent confidence interval for 
the mean IQ of the population of school children involved. (See Sections 
10.3 and 10.4). _ 

Solution. Here N = 65, Xı = 94, and $ = 20. Also from Column 2 
of Table II, Appendix C, we see that 2,495 = 2.58. Hence, application of 
(11.3) gives: a) 

= 94 سک‎ (2.58) = 94 — 6.45 = 87.55 
1= 91 TT (2.58) = 94 — 6 


20 " 
p= 2.58) = 94 + 6.45 = 100.45 
idc" ت‎ ы 


Comment. The student may wonder how it is possible that Solution Т, 
With œ = .01, led to the decision to reject the hypothesis that u = 100 while 
the value 100 falls in the 99 per cent confidence interval just obtained. It 
Will be recalled, however, that in this solution the superintendent elected, 
for the purpose of the decision required of him, to ignore the possibility that 
u> 100. Hence, the level of significance he adopted (а = .01) actually 
corresponds to the use of 'y/2 = .49, that is, to a y value of 98. As an 
exercise the student may wish to obtain the 98 per cent confidence interval 
for u using the data of the above example. If this is done it will be found 
that the resulting interval does not contain the rejected hypothetical value 


of 100. 


As was explained in the foregoing section, the 99 per cent confidence 


interval, which we have just established by making c = d, is only one of an 
Unlimited number of 99 per cent confidence intervals which could bw rd 
lished for the particular collection of data at hand. It is possible to establish 


nfidence intervals us and istances or even 
per nt confid t 15 using unc ual c and dis 

e uneq а t 
d vals which are open at one end 


to establis + cent confidence inter 
"or и aes le d= = and c bea distance шай! HEN Wa PDA: 
bility of X REE wand C is (y = :50). Then іп our example, 


20 (2.33) = 5.83 


C= 032410 = Jaci 


2 
Now applying (11.3) we have 
п=Х—4=94-®=—%® 

д=Х+с= 91+ 5.83 = 99.83 
1 7 as being something less than 

i sti the value of u as 
i a pu denar ui mh same confidence coefficient (y = .99) as 
У ше this 1 as 


and 


329 


Ng 
NTERVAL ESTIMATION 


the interval previously determined, it provides a less precise estimate of u 
because of its greater width resulting from the fact that its lower end is 
left open. By allowing c and d to differ, it is possible to obtain different 
intervals all having y confidence coefficients. As we have previously indi- 
cated, the best confidence interval for a given value of y is in general the 
narrowest one, though occasional situations may arise in which only an 
upper (or lower) limit is needed. When the sampling distribution is sym- 
metrical, the narrowest 100% per cent confidence interval is, of course, the 
one for which c= d. 


Example 2. Using the data of Solution III of the problem of the prin- 
cipal and the superintendent determine the 95 per cent confidence interval 
for the mean IQ of the population of school children involved. (See Section 
10.7.) 

Solution. Here N = 50, Xi = 94, and ё= 20. Also from Column 2 
of Table II, Appendix C, we see that 2.475 = 1.96. Hence, application of 
(11.3) gives: 


2 
u-91— A (1.96) = 94 — 5.6 = 88.4 


" 20 
p= Vie (1.96) = 94 + 5.6 = 99.6 


11.5 Tue 1007 Per CENT CONFIDENCE INTERVAL FOR THE 


MEDIAN or A NORMALLY DISTRIBUTED POPULATION 


By Rule (9.3) we know that the sampling distribution of the median 
tends to be approximately a normal distribution when N becomes large- 
Moreover, by Rule 9.4b we know that the standard error of this sampling 
distribution is given by 1.250% if the population is normal. Hence, give? 
a large (N > 50) random sample, the reasoning of the foregoing section 

ay be applied to the problem of approximating a 1007 per cent con- 
fidence interval for the median (£) of a normally distributed population- 
The formulas for the lower (€) and upper (E) limits are as follows: 


1.25 8 


MNT кы кун (11.4а) 
7 190 
E= тат + ie - (11.40) 


where mdn; is the median of the 


Example. Using the data of Soluti i 
Solution I i » 
and the superintendent, determi ioe аа. 


ў пе ће 99 рег се i га] for 
the median IQ of school children involved. (See Seen a 


*In this solution mdn; = 93, 8 — 20, and 
, 


particular sample at hand. 


N=65. 


330 


N 
INTERVAL ESTIMATIO" 


Solution. 
2 
&= 93 (1.25)(20) (2.58) = 93 — 8.06 = 84.94 


57777 Уб 1 
к= 93+ Cae (2.58) = 93 + 8.06 = 101.06 


Comment. Notice that the width of this interval is 16.12 as compared 
With a width of 12.90 for the 99 per cent confidence interval of ш obtained in 
eding section, in spite of the fact that N and 8 are 
the same in both instances. This is, of course, due to the fact that the 
median varies more from sample to sample than the mean (Oman = 1.250%) 
and, hence, cannot be as accurately estimated from a sample of a given size. 


Example 1 of the prec 


11.6 Tue 1007 Per CENT CONFIDENCE INTERVAL FOR A 
POPULATION PROPORTION 


Given an infinite dichotomous population, the units of which either 
do or do not belong to Class A. Then by Rule 9.5 we know that as N 
becomes large,* the sampling distribution of the sample proportion, p, of 
Ачу pe units tends toward a normal distribution. The standard error of this 
distribution is given by 

o= га — 9) [see (9.9)] 
N 
of A-type units in the population. 

If we apply the reasoning of Section 11.4 to the problem of approxi- 
mating a 100% per cent confidence interval for the population proportion 
($), the formulas for the lower (Ф) and upper (Ф) limits are as follows: 
(11.52) 


(11.5b) 


Where ф is the proportion 


Ф = pi — Fp? 7/2 

p= pı + @›ёү? 

where p; is the proportion of A-type units in the particular sample at hand. 
It is obvious, however, that these formulas cannot be applied, since 
the magnitude of c, is itself based on ¢, the very value which we seek to 
estimate, In other words, we are here confronted with a situation in which 
Our method of determining confidence intervals fails owing to the fact that 
the magnitudes of the c and d distances depend upon the magnitude of the 
Very parameter we wish to estimate. However, if N is large, it can be 


Shown that the use of 


pil = p) (11.6) 


1 


p= 


ашы د‎ ERE کے‎ 
of sample nec 


*F 

Or remarks regarding the size 

5 arks regarding the 8124 < 
heory, see the concluding paragraph of Section 9.6. 


essary to the practical application of this 


331 


INTE 
NTERVAL ESTIMATION 


in place of c; in (11.5) leads to values of ф and $ which, Ta ane ^e 
purposes, serve quite adequately as approximations of the gal de ~ 
100y per cent confidence interval.* Hence, if the availability e «TE 
random samples is presumed, formulas (11.5) may be revised as follows: 


$ = pı — TF pay2 (11.72) 
$ = pı + 0,22 (11.7b) 

where @, is as given by (11.6). | 
Example 1. Using the data of Solution IV of the problem of the prin- 


cipal and the superintendent (see Section 10.8), determine the 99 per cent 


confidence interval for the population proportion of school children having 
IQ scores below 90. 


Solution. Here N = 100 and pi = .36. Hence, application of (11.7) 


gives: 7 5 
.36(1 — .36 2.58) = .36 — .124 = .236 
m EE NA 
X. .36(1 — .36) 2.58) = .36 + .124 = .484 
5= зв+- [38139 (ву = ac. 


Example 2. Using the data of Solution V of the problem of the eet 
cipal and the superintendent (see Section 10.9), determine the 95 per pane 
confidence interval for the population of school children having 1Q scores 
below 100. 


Solution. Here N= 100 and p; = .61. Hence, applying (11.7) We 
obtain: 


$-.014 “= (1.96) = .61 + .096 = .706 


o=.61—,. 1900 (1.96) = .61 — .096 = .514 


11.7 THE 1007 Per CENT CONFIDENCE INTERVAL FOR THE 


DIFFERENCE BETWEEN THE MEANS or Two POPULATIONS 


Given a random sample of n 
an independent random sample 
mean u2. Let X; and X 


А А А and 
1 cases from a population having mean Mı к g 
à avin 
of пә cases from a second population w : 
2 ‚ The 
2 be the respective means of these samples. Th 
*Application of a different an 


бее! 
Г 1 ind more general technique for establishing 100y per ate 
confidence intervals, a technique beyond the scope of this text, leads to the follow! 
formulas for and ф: 


Ф,%= 2Nm + г? T гуз VANp np 4Np?, 
i 2(N + 22,5) 


332 


ION 
INTERVAL ЕЗТІМАТІ 


by Rule 9.7 we know that as n; and nz become large, the sampling distribu- 
tion of the difference X; = Ye tends toward a normal distribution having 
à mean of 4; = рә, and a standard error of Vox, + т?з, [see (9.11)]. 
Hence, given large independent random samples from each of two popula- 
tions, it is possible to apply the reasoning of Section 11.4 to the problem of 
approximating the 1007 per cent confidence interval for the difference 
between the two population means. If we let А and A represent respec- 
tively the lower and upper limits of this interval estimate, Di, the X1 — Xa 
difference for the particular set of samples at hand, and if we use formula 
(9.28) to estimate the standard error of the X, — Хә sampling distribution, 
we have the following formulas for the approximate 100y per cent con- 
fidence interval of the ду — Me difference: 


= 8? 825 
A= Dis аур (11.8a) 


(11.8b) 


Example 1. Using the data of the psychological problem Experiment 
I (see Table 10.4), obtain the 99 per cent confidence interval for the differ- 
ence between the means of the hypothetical punishment (Р) and no- 
punishment (NP) populations. 
Solution. Here np = 50, ур = 65, 
Di = Xp— Xup = 19.64 — 37.58 = — 17.94 


8?, = 38.5904 and 82yp = 582-5505. Hence, applying (11.8) we obtain: 


58.5004 , 582.5505 _ i; _ 
4= 17.94 — (2.58) T 582.5503 = — 17.04 = 11 =— 20.05 


38.5904 | 582.5509... 17.94 + 8.11 = — 9.83 


50—1 ' 65—1 


А — — 17.94 + (2.58) 


Comment. Note that the negative signs simply indicate the direction 
of the difference. In this example they imply that the mean of the punish- 
ment population is the smaller. Since the criterion scores consisted of the 
number of trials required for learning, it follows that the negative limits 
Indicate more rapid learning on the average for the punishment population. 

Example 2. Using the data of the psychological problem Experiment 
П (see Table 10.5), obtain the 95 per cent confidence interval for the differ- 
ence between the means of the hypothetical punishment-of-both* (PB) 
and punishment-of-failures-only (РЁ) populations. 

Solution. Here npg = ner = 50, 

Di =Xpe— ¥ pp = 19.44 — 22.14 = — 2.70 
کک‎ елы суы 
*Le, both successes and failures. 


333 


IN 
NTERVAL ESTIMATION 


8? pp = 25.6064 and $?pr = 34.1204. Hence, applying (11.8) we obtain: 


L 25.6064 | 34.1204 = ee 
A= — 2.70 — (1.96) 4 lerem: + 7= - 2.70-2.16=- 486 
5 25.6064 , 34.1204 
=— 2.7 —— =— 2.7 =— 0.54 
A= — 2.70 + (1.96) < 12:06 +02] =— 2704 216— — 0.5 


Example 3. Using the data of the psychological problem Experiment 
III (see Table 10.6), obtain the 95 per cent confidence interval for the mean 
(up) of the hypothetical population of D-scores (0 = Хьв — Xpr) for 
matched pairs of subjects representing the hypothetical punishment-of- 
both (PB) and punishment-of-failure-only (PF) populations. 

Solution. Here we are actually dealing with a single sample of D-scores 
so that (11.3) applies. Since N = 50, D; =— 2.56, and $p = 5.42, the 
application of (11.3) gives: 


5.42 

= — 2.56 — ——— (1.96) 2 — 2.56 — 1.52 — — 4.08 
ED Jo _ 099 = — 2.56 — 1.52 = — 4.0 
ip =~ 2,564 2 


VE (1.96) = - == 
\/50—1 190) = — 2.56 + 1.52 1.04 


Comment. The limits obtained 
the same parametric difference that 
However, the width of the interv: 
the case of Example 2. The incre 
the greater narrowness of the inter 


here actually provide an estimate of 
was estimated in Example 2 above. 
al is only 3.04 as compared with 4.32 1) 
ase in the precision which is indicated by 


334 


INTERVAL ESTIMATION 


SOME SMALL-SAMPLE 
THEORY AND ITS 
APPLICATION 


12.1 INTRODUCTION 


ai in the chapters on testing statistical hypotheses and on interval 
nation, repeated references were made to the approximate character 
As an example, consider the test of a statistical 


oft ; 
һе techniques presented. 
As a test statistic we used 


h и 
YPothesis about the mean of any population.* 


X — щш 


9x 


z= 


sis to be true, we interpreted this z as a normally 
variable with mean zero and variance one 
approximately correct. In order for 


us i £ "Wet 
and Interpretation to be exactly correct, X must be normally distributed 
i standard error (сұ) must be known. ‘Thus our interpretation is 
oximate on two counts. First, unless the population sampled is normal 

' 


the 3; . = А 

is Sampling distribution of X only tends toward a normal distribution as N 

ban large (see Rule 9.2), and second, an estimate is used in place of the 
' value of the standard error of this X sampling distribution. 

ariance (see Rule 9.2). 


ist шшщ the hypothe: 
mn uted random sampling 
‘tually this interpretation is only 


“Of cours TEC. ы 
se the population must have a finite V 


335 


SMA 
LL-SAMPLE THEORY 


Now there is nothing wrong with using approximations so long ^ they 
= i "eur: t the practical demands of the situation. 
are sufficiently accurate to meg p : Ne ane 
This is true of our interpretation of the above 2 50 long as the we i р 
are fairly large, say at least 50. If, however, circumstances Loni . |, ipi 
ing large samples, our interpretation becomes too inaccurate to F | id 
use to us. In such situations, then, a new theory is needed w и А is 
provide a test statistic that can be accurately interpreted regardless 0 
size. Non - 
оо statistical hypotheses we establish a critical region in je 
of the scale of values of the test statistic, such that, if the hypothesis € we 
test is true, the probability of a value of the test statistic in this pe 
would correspond to some arbitrarily selected percentage (a) a 
level of significance. This percentage represents the degree of control wd 
cised over a Type I error. Thus if, as in Solution Ш of the problem of ' 4j 
principal and the superintendent, we let а = .01 and establish the ipt 
region (R) as that portion of the z-scale extending downward from — vs 
and if the hypothesis (/7) under test is true, then we could, neverthe pee 
expect to obtain values of z in R one one-hundredth of the time ina d 
number of repetitions of the test. That is, we would reject this due s 
1 per cent of the time in the long run. Now, if the test statistic is only e 
proximately normally distributed with mean zero and variance one, the 
it follows that our control over a Type I error is only approximately ^ 
Hence, we see that the approximate character of our knowledge of \ M 
sampling behavior of the test statistic means that we are able to coon 
only approximate control over Type I errors. If the actual control ae 
responds closely to the selected value of а, the test is appropriate 1n Vs 
of its approximate charaeter. On the other hand, if the actual probabil iy 
of the test statistic falling in R differs markedly from this value ofa, t i 
test is inappropriate. For example, if with a small sample the ш 
probability of а z below — 2.33 is, say, ten instead of one per hundred, the 
the use of z as a test statistic would be clearly 


the desired degree of control of .01 over the relative frequency of беш, 
of a Type I error, the actual long-run relative frequency of such errors WO 
be .10. 


Я sad O 
inappropriate, for insteac 
inappropriate, fo nce 


Statistical tests which are based on test. statisties for which the BJ 
pling distributions are exactly rather th 
exact tests. With such tests we are in a position to determine exactly ! d 
probability of the test statistic (8) falling in some specified critical rego 
(R) if the hypothesis (H) is true. That is to say, we are able to pn 
exactly the probability of a Type I error for a Шут R. This, in turn, и 
plies that whenever the exact sampling distribution involved is continuo 
we can establish an R for any selected level of significance and know thé 
the probability of S in this R is exactly æ if H is true. In this text, the on 


e calle 
an approximately known are ү: {һе 


336 


‚„юнкОЁЎ 
SMALL-SAMPLE TH! 


exact sampling distributions which we will consider are continuous dis- 


tributions.* 


12.2 A New INTERPRETATION OF AN OLD Test STATISTIC 
Consider again the test of a statistical hypothesis about a population 
mean. We pointed out in the foregoing section that our interpretation of 


the test statistic, 
Ин 


z= 


Was approximate for two reasons. First, the sampling distribution of X 
only tends toward a normal distribution as V becomes large, and second, 
an estimated rather than true value of the standard error was employed. 
Now, if the populations with which we deal are normally distributed, then 
the first of these reasons for the approximate character of our interpretation 
This follows from the fact that means of random samples 
distributed populations are also normally distributed 
9.1). Hence, if we are willing to restrict 
an in a 


of г is eliminated. 
taken from normally 
regardless of sample size (see Rule 
ourselves to dealing with normally distributed populations, we с 


Sense cut our problem in half. That is to say, we need only be concerned 
gan estimated rather than 


With the effect upon our interpretation of z of using 

2 true value of the standard error of the mean. Limiting ourselves, then, 
to dealing with normally distributed populations, the problem becomes one 
of describing the DANNET in which an infinity of values of the test statistic 


ўш Уш Хш yea 
~ a/VN=-1 9 


sumed that the hypothesi 
may be approximately interpreted as a z, 
normal sampling distribution with mean 
smaller the value of N, the less valid this 
This suggests that different interpre- 


is true. Of course, 


Would be distributed if it is 
When N is large, this test statistic 
Le., as having an approximately 
zero and variance one. But the 
Approximate interpretation becomes. 
tations are necded for different sized samples. 
aN рае 
Prey 
oe very useful statistics, as for е: 
of butions which are discrete. In вис! 
si n Set of diserete points гаће 
“IDle to establish ¢ any vi 2 : RA aE) 
is Mig я an R ee Пека from the fact that since the sampling distribution 
is discrete | пе. [ог xhich the probability of 5 is exactly equal to the value 
Selected for Do Au € n^ use of the exact sampling distribution ш 2 situations 
does make Ж ЖЫ ‘to determine exactly the probability of S us | i Be true, "d 
"AU while WE E nat have complete freedom in the айр ee Lata y x de 
termine the eet. probability of a Type I error fora ES (різ text. E 
К h e exact proba 1 а 2) 5 A scope о s text. 
Sampling tb which are discrete 18 bevond the scop 


portion, have exact sampling 
1 situations, the critical region consists simply 
л a continuous scale, and it is not pos- 
han a portion of a con es ў 

ij ү БЕ whatever such that the probability of S in R 


xample the sample pror 


337 


8) 
MALL-SAMPLE THEORY 


It is customary to designate this test statistic by the letter t n ve dn 
distinguish between it, as we shall come to interpret it for sma end e 
and z. Assuming the population to be normally distributed - he iin М 
pothesis to be true, mathematical statisticians have is pr b и co 
manner in which t is distributed for samples of any given size. This = м 
that it is possible to establish a critical region (/2) in terms of ue un 8 ay 
such a way that if the hypothesis is true, the probability of a t in RB is rd 
a. Hence, through the use of t asa test statistic, we havea test of a sone 
sis about the mean of a normally distributed population which eem 
for exact control over the expected or long-run frequency of a Type I m п 

Instead of describing at this point the distribution of this particular t, 


we shall turn our attention to a somewhat more general treatment of this 
test statistic. 


12.3 THE t-SraTISTIC AND Irs SAMPLING DISTRIBUTION 


Let S represent any normally distributed statistic and let их pau 
the mean of its distribution. Also let 5 represent a particular estimate | 
the standard error of this statistic. We shall not attempt here а ees 
statement of the particular type of estimate of standard error which is ^ A 
quired by this theory. Instead we shall present for cach application of p 
theory a specific formula for the estimate (£4) involved. It is sufficient a 
our purpose that the student simply recognize that not all conceivable 
estimates of the standard error of S are appropriate to the theory. — 

Now the mathematicians have shown that the sampling distribution 9 
the statistic 

1= Sus (12.1) 
os 
is exactly described or modeled by the mathematical curve 
с 


y= LF D/2 
[+2] 


where df is some function of sample 
stant the value of whic 


س 


(12.2) 


. | ‚э. con 
size and C is a rather complicat d co 
h depends upon that of dfi 


*The original derivation of this distribution is due to an eminent British st: финал) 
William Sealy Gosset, who, because of a ruling of his employers (Guiness Brewen 
Dublin) regarding publication of research findings, wrote under the pseudony 1, tu- 
“Student.” Asa result, th distribution of ¢ has come to be known аз 5 
dent's" distribution. 


i с - — Iu — 1/2)! 


7 Малта — 2) 2j 
The df-value is discussed in Sectio 


ingle 
n 12.4. Note here that df is to be interpreted as а sing 
symbol and not as the product of d times JY: 


e sampling 


338 


RY 
SMALL-SAMPLE THEO 


Table 12.1 shows the values of y corresponding to selected values of f 


for df-values of 3, 15, 29, and infinity. Plots of these curves except for 
TABLE 12.1 Ordinates of t-Curve for Selected Values of t and df 
t df=3 df= 15 df = 29 df = œ% 
0 .368 .392 .896 .399 
+ 0.5 .313 844 .348 352 
i10 .207 .234 .238 .242 
+1.5 120 128 129 130 
+ 2.0 068 059 .058 .054 
+ 2.5 .039 .024 .021 018 
+ 3.0 023 .009 .007 .004 
+ 3.5 014 003 002 .001 
+40 009 001 001 .000 


о 12.1. For the purpose to which we will put 
dent be able to verify the values 
acquire only a general knowledge 


a= 29 are shown in Figur 
this theory it is not necessary that the stu 
given in Table 12.1. It is sufficient that he 


df=% 
df=15 
df=3 


0 +1 +2 +3 +4 
#-Scale 


e t-curves for the df-values of 3, 18, and © 


Figure 12.1 Th 


sampling distributions modeled by (12.2). The 


of the characteristics of the { 
racteristics are 


n i e 
nore important of these cha 
he t-curve approaches the normal 


1. As the value of d. approaches infinity th 
Curve for which u = oe g= 1. Inother words as df becomes large, t-values 


simply: 


339 


S) 
MALL-SAMPLE THEORY 


may be interpreted as z-values.* That the approach is quite rapid is л 
from a comparison of the curves for df= 15 and for df= © ан ie i 
Figure 12.1, and also from a comparison of the y-values given in Table 12. 
for the curves for which df = 29 and df 2 =. ; 

2. The t-curve is symmetrical and bell-shaped with center at a 0 un. 
varies in form with the value of df. When df is small, the proportion of bos 
area beyond extreme (-values is much greater than that of the normal e К 
beyond corresponding z-values. For example, in the normal curve, ellen 
the area lies above z = + 2, but in the суе for df = З, .070 of the area us 
above £=+2. There are, then, actually many different curves repre- 
sented by (12.2)—one for each value of df. | | 

3. For t-values arising from the repetition of a particular sampling situa- 
tion there is a particular t-curve which provides an exact model of the sampling 
distribution of the statistic 1 in that situation, The problem, of course, is (0 
select from among all t-curves that particular one which is appropriate as % 
model in the given situation. ‘This is done through use of the df-value. The 
following section treats the role of the df-value in this sampling theory. 

4. The area under any t-curve is one. This, of course, must be true of 
any curve which serves as a model of a s 


ampling distribution since such 
distributions are by definition rel 


ative frequency distributions. Since ina 
area of the portion of the curve above a designated segment of the t-scale i5 
interpreted in the model as representing the frequency of t-values in this 
Segment of the scale, and since the total area under the curve is oiie, Б 
follows that the area above such a segment actually represents the rela 
Srequency or probability of t-values in this segment of the scale. 


124 DEGREES OF FREEDOM 

Thus far we have simply re 
some value which affects the fo 
initials of the key words in the 
degrees of freedom as it applies t 


ferred to the df of (12.2) as representing 
rm of the (curve, The letters df are the 
Phrase degrees of freedom. The concept xi 

o a statistic is fundamentally mathematica 
and is difficult to explain intuitively. We shall, therefore, not attempt 2 
rational development of the degrees-of-freedom concept. Instead we shall 
he content simply to State that the number of degrees of freedom of 8 
statistic is always some function of the number of observations from which 
the statistic is computed, a function Which enters into the mathematical 


formula for the sampling distribution of the Statistic in such a way as (0 
influence the form of this distribution. Thus a Particular statistie may not 
have a single sampling 


distribution but rather a family of distributions each 


c M 
*Compare the ordinates corresponding to the t- 
Table 12.1 with th E Ў 


values for df — 29 and for dfe m 
hose (the у-у 
Appendix С. 


z Š a Table Ш 
alues) corresponding to the same z-values in Table 1 


340 


+ 
SMALL-SAMPLE THEOR 


member of which is the appropriate distribution for a given value of this 
function, that is, for a given number of degrees of freedom. The number of 
degrees of freedom of a statistic is used in statistical work simply to identify 
the particular mathematical curve which serves as an appropriate model 
for the sampling distribution of the given statistic. Thus if the df-value for 
а particular t-statistic were 3, the t-curve for which df = 3 (see Figure 12.1) 
would be used as the model of the sampling distribution of this statistic. 
If, on the other hand, the df-value for a particular t were 15, then the curve 
for which df = 15 would be used. A rule for determining the number of 


degrees of freedom of a statistic follows: 
Rete. The number of degrees of freedom of a given statistic (S) is equal 


to the number of observations involved minus the number of necessary auxiliary 
values used in the computation of S—awriliary values which are themselves 


derived from the observations. 
Consider as an example the estimated standard error of the sampling 


distribution of Y as given by 


This Statistic (05) is based on N scores or observations. To compute &% it 
1S first necessary to compute $. But in order to compute $ onc auxiliary 
Value is nec агу. This value is that of the point from which the deviation 
Of cach observation is measured in computing 8. We use as the value of 
this point the mean of the observations. Hence, the number of degrees of 
treedom of the statistic 0% is simply one less than the number of observations, 


Le, (Ж — 1), | 
The beginning student of statistics may expect to experience some 
ation of this rule. Hence, we shall follow the prae- 


difficulty with the applic ; n iic 
tice of providing a formula for df which is specific to each application of ¢ 


as а test statistic that we present in this text. . 
ше : e shall state a rule for the selection of that 


2) which is appropriate as a model for 
(12.1) in a particular sampling 


In concluding this section w 
Member of the Leurve family of Uz. 
the sampling distribution of { as defined by 
Situation, 


т TES lel of the sampling dis- 
ings Ж o which is appropriate as а mode! o : f 
ULE. The [-curve which рр urve for which the value of 


T el 5 D jon is l-ci 
bution of tin a given sampling situation is that 
df is а e ~ 

If ds the same as that of Gs. 


ў ов -ССКУЕЅ 
12.5 TABLES OF AREAS FOR t 


e -eurves as models of sampling dis- 
As we have indicated. we shall use f-eurves as тос dcl 
8 we have indicated, test statistic to test statistical 


tributions of the t-statistic. In using tasa 


341 


SMa 
MALL-SAMPLE THEORY 


hypotheses, we shall need to designate portions of the scale eum 
regions. This in turn implies that we must have information гових ing | « 
areas of the portions of the various t-curves lying above designated sigma n í 
of the tscale, for otherwise we have no basis for establishing critical regions 
corresponding to our selected levels of significance. Р 

It would, of course, be possible to develop for each t-curve a table 9 
areas similar to that given for the normal curve in Table II, Appendix a 
This would imply a voluminous collection of at least 30 such tables (per- 
haps after df = 30 the t-curve would be enough like the normal curve e 
justify the use of z as an approximate test statistic). However, we usun y 
select our levels of significance from among the values .001, .01, .02 or 020, 
.05, .10, and .20, and our critical regions are simply located at one, or the 
other, or both ends of the tscale. Hence, there is actually no need for 1m- 
formation about (-curve areas beyond that which would enable us to estab- 
lish such critical regions for these selected levels of significance. It follows 
that we can organize all the area information we need for at least 30 l-curves 
into a one-page table. "Table 12.2 shows how such a table may be organized. 
A complete table is given as Table VI, Appendix C. 


TABLE 12.2 Probability Points of t-Curves 


dj P=.25 | 20 | 10 | .05 | .025| .01 | .005 | .001 ‚0005 
2P= 50 | 40 | .20 | 10 | .05 | 02 |01 002 001 


3 0.77 0.98 | 1.64 | 2.35 | 3.18 


454 | 584 | 1021 | 12.02 
15 069 | 0.87 | 1.34 | L75 | 213 | 260 | 295 | 373 | 407 
29 0.68 | 0.85 | 1.31 | 1.70 | 2.04 j| 276 | 340 | 3.66 
= 0.07 | 0.84 | 1.28 | 1.64 | 1.96 258 | 309 | 3.29 


In Table 12.2 (and in Table VI, Appendix С), the df-values by means 
of which we select the appropriate curve are given in the left-hand column. 
Thus each row of this table applies to a different t-curve. The remaining 
columns give t-values which are exceeded by the proportion P of the area of 
the curve involved. These t-values may therefore be used as the lowe? 
bounds of 100P per cent critical regions located in entirety at the upper end 
of the t-scale. Since the t-curves are symmetrical and centered on zero; the 
negatives of these /-values constitute the upper bounds of 100P per cent 
critical regions located in entirety at the lower end of the t-seale. Finally; 
the negatives and positives of these t-values prescribe symmetrical WO 
ended critical regions for a level of significance equal to 2P. 

Example 1. Raf is З and the level of significance (a) is .05, establish 
a i r шы entirely at the upper end of the t-seale- 


342 


4 ў 
SMALL-RAMPLE THEOR 


Example 2. If df=3 and a= 05, establish an R which is located in 
entirety at the lower end of the t-scale. 
Answer, R: ts — 2.35 


Example 3. M = З and a= .05, establish a two-ended R with arca 


of œ 2 at each end. 
Anser Je Te ЗЛ ота С 3.18 


Comment. This R could also be designated as follows: 


|1| = 3.18 
ate that the absolute value (i.c., the value with- 


exceed 3.18. Note that for a two- 
1 from the column of the table for 


Here the vertieal bars indic 
out regard to sign) of ¢ must equal or 
ended № of this type the t-value is reac 
which 2P = a. 


ST STATISTIC TO Test A HYPOTHESIS 


12.5 Tne Use oF A8 А TE 
y DISTRIBUTED POPULATION 


AnovT THE MEAN OF А NORMALL 
ves to dealing with normally distributed popula- 
bution of X for random samples of size N will be 
esponding to the population mean и 
ly with the requirements established 
the mathematical statisticians 


А If we restrict oursel 
tions, the sampling distri 
normally distributed with mean corr 
(sce Rule 9.1). Hence, X and и comp 


lor S and us in Section 12.3. Moreover, 

have shown that 8 
=_= 
ox N-1 


error of X which satisfies the conditions they 
Also, as we have already shown in Section 
associated with d is N — 1. Hence, 
JN — 1 for S, MS, and Fs in (12.1), 


м an estimate of the standard 
have imposed upon @ of (12.1). 
12.4, the number of degrees of freedom 
Substituting respectively X, А, and $, 
We obtain Y 
A= PVT 2 

m = ысы VN-1 (12.3) 
atistic to test a hypothesis about the mean of a 
E | we substitute for u in (12.3) the value 
true the long-run probability of a 


To use this Гах a test st 
Normally distributed populatio! 


hy " > А hesis is 

Ypothesized for it. If this hypothesis © cm ; : 

Lin the critical region will co respond exactly to the sc lected level of sig- 
у al reg 


ы : cbability of a t in the critical 
nificance, Tf thi js is false the probability o а 
ance, this hypothesis 18 © à a agni 

region will be aa greater depending, of course, upon the magnitude 
9f the error in the hypothesized vali: 

ain the problem of the principal and the 
е superintendent follows the approach 
ept that instead of instructing the 


Example. Consider once ag 
Superintendent. Suppose that th 
Previously deseribed as Solution T, 6 


343 


SMa 
ALL-SAMPLE THEORY 


school psychologist to obtain WISC IQ scores for a random sain 
children he instructs her to obtain such scores for a random ap eo LE 
p» children: Assume the five scores reported by the psychologist to pt | 
the values 59, 65, 107, 89, and 80. The superintendent s apprann c 
аз а test statistic to the solution of his problem is outlined below. 


Sree 1. H: w= 100; alternative д < 100 
Өтер 2. a= .01 (as in Solution I) 
STEP 3. R: t= — 3.75* 
(Note: df=N— 1=5—1=4.) 
Srep 4. Calculation of t for sample at hand. 


ZX = 59 + 65 + 107 + 89 + 80 = 400 
=80 and (ZX)?/N = 32,000 

ote eon Ду, 400 = 33,476 
iur. ag 


0 — 100 = 
Hence, mnm V5—1 


17.18 
— 20 —40 
== D e 
17.182 17.18 248 


Strep 5. Decision. Retain the hypothesis. (Why?) 
It will be instructive to indicate 


А sible 
R in terms of the seale of possi 
values for X. From (12.3), we have 


= ا‎ — deN-1 uem 


Substituting in (12.4), we obtain 
т 17.18(— 
gal ( m + 100 
S=] 
= = 32.21 + 100 = 67.79 
Hence, in terms of the X-scale the critical region is 
R:X< 
In Solution I with a sample 
to be 


67.79 = 67.8 
of 65 children the critical region was found 
R: X < 942 


*See Table УТ, Appendix C. 


344 


ORY 
SMALL-&AMPLE THEOR 


That is, in Solution I (N = 65), a sample mean of 94.2 or less constitutes 
sufficient evidence to discredit the hypothesis that u = 100, whereas when 
N is as small as 5, a sample mean of 67.8 or less is necessary to discredit this 
same hypothesis. This suggests that our small-sample test is not very 
powerful; that is, unless the difference between the parameter and the 
value hypothesized for it is very great, our small-sample test is not likely 
to detect it. In other words, it appears that use of the t-test statistic with 
small samples is likely to lead to frequent commission of Type II errors 


(retention of false hypotheses). 


P-Scale 


10 


30 40 50 60 70 80 
43.4 u-Scale 


istic ets regarding a population mean 
Fraumg 12.2 Power curves for statistical tests regt 9 

more definite indication of what 
r from a small sample test of a hypothesis 
12.2 shows the power curves for the 
em of the principal and the 


Figure 12.2 was developed to provide a 


maybe expected by way of powe 
about a population mean. Figure 


Statistical test used in Solution I of the prani tical test as applied above 
Superintendent and for the small-sample statistical U 


«er curve labeled N = 65 is 
to the solution of this same problem. к: б, the appearance of greater 
le same curve as is pictured in Figure 10.6, t 


je le unit. The curve labeled 
Bites ул aw he choice of sca К | 
oe due rd dient based on samples of 5 cases applied to 
= 5 15 the power curve 10r й 


Е constructing this latter curve is beyond 

es: | she technique of construc онт 
t sae. fro E o terga however, follows precisely along 
Scope of this text. 11% 


п ves we have studied. From 
e same li hat of the other power curves we 1 
p: e lines as that 


"ure 12.2 we may note that: 


345 


SM, 
MALL-SAMPLE THEORY 


nes "ened 
(1) Both tests are equally effective with respect to eng е 
тог. This, of course, follows from the fact that a .01 leve 2 signi "iu 
me sed i both instances. The figure shows that when u = 100 (1. lcs 
та em ar ok the probability of wrongly rejecting the hypothesis 
ше oer A Жел. P) of a Type II error in the sis Be 
test t Solution I (V = 65) is .05 when w= 90.1. Le, jq icri 
the value hypothesized for it by 9.9 the probability of wrongly re 
the false hypothesis that u = 100 is .05 when № = 65. — О 
(3) When и differs from the value hypothesized by 9.9 the E Ц 
of a Type П error in the case of the t-test is between ‘5 and 4 k dab 
(4) In order for the probability of a Type II error in the cast Ach 
test to be as small as .05, it is necessary that the actual value of u 


$ ни ; value: hy- 
ie., it is necessary for the actual value of u to differ from the value 
pothesized by 56.6 IQ points. 


results 

It is obvious from the foregoing that the use of small samples pore 

in an extremely severe loss in terms of the power of the test to detect po 

in the hypothesis, Clearly, small samples should be employed only c 

cireumstances are such as to preclude the selection and use of large won 

It is also very important that the student keep in mind ш vieil 

the foregoing small-sample theory is valid only for normally eee Ше 
populations. Strictly speaking, if it is not reasonable to assume thé 


H Seba Р ost S atistic 
population involved is normally distributed, the use of £ as a test st 
is inappropriate. 


= rp - r с р к кае. -pOTHESIS 

12.7 Tue Use or ( as А TES STATISTIC To TEST Ti НҮРОТНЕ 

or No DIFFERENCE BETWEEN THE MEANS ОЕ Two NORMALLY 
DISTRIBUTED POPULATIONS 


А . ‘ ics ;mall-sample 
In this seetion we are concerned with the application of small-x 


» ‘ ans of two 
t-test theory to the problem of testing the hypothesis that the means of 
populations are equal, 


á А sory tO 
As in the case of the application of this maa i 
the testing of hypotheses about the magnitude of a population mean, Шу 
necessary to restrict our area of operation to populations that are norm 
distributed. We shall consider this problem in two Cases. 

Case Т. Independent random samples from equally variable populations. 

In this first case 

pendently and at ran 
i.e., which have equ 
lations have equal 


it is assumed that we 
dom from two popul 
al variances, 
variance 
general applicability of o 


have selected our samples де 
ations which аге equally one Ê 
Clearly, the requirement that the e 
es can only impose a further limitation on ; 


ч m 
Ur test. It is completely appropriate onlY 


346 


š 
ORY 
SMALL-SAMPLE THE 


situations in which the two populations involved are equally variable and 


normally distributed. 

Let the populations be designated 
bers as subscripts in identifying variou 
istics. For example, we shall use д1 and pe to represent respectively the 
means of Populations 1 and 2. Similarly X1 and Xs will be used to represent 
means of independent random samples selected respectively from Popula- 
tions 1 and 2. Now if the populations are normally distributed it follows 
from Rule 9.1 that the sampling distributions of X1 and Хә are also nor- 
mally distributed regardless of sample sizes. That is, X1 and Xe are two 
normally distributed independent random variables and it further follows 
from Rule 9.6 that the sampling distribution of X; — X2 = D is normally 
distributed with mean equal to 41 — M27 A. Therefore, D and A satisfy 
the requirements imposed upon S and из of (12.1). Hence, to apply (12.1) 
to this situation it remains for us only to discover the particular estimate 
of the standard error (25) of the sampling distribution of D which meets 
the requirements of the ¢ theory and to establish its degrees of freedom. 

It is in connection with this problem of obtaining 85 that the mathe- 
matical statisticians have found it convenient to impose the restriction that 


the populations be equally variable. It is clear that if the populations are 
1 if с21=0°2 = g?, the best estimate that we can 


e (т?) is one which will be based on some 
formation about this common variance 


Which is contained in both sam hematical statisticians have 
ate of this common variance (т?) results from 


shown that an Я м 

unbiased estima е s 
a sort of weighted averaging of m п à шуа 
of the samples from Populations 1 and 2 be represente respi y by 821 
and 825, and if we let nı and 72 represent the sizes 0! s ve ве 
"iin an unbiased estimate of the common population variance is given by 


1821 + 72872 (12.5) 


as 1 and 2. We shall use these num- 
s population or sample character- 


The mathematical statisticians have further shown that if this 8? is 
Used in place of both g?, and c?» in 9.11), ie, if we write 


eS de +2) (12.6) 


of the sampling distribution of 
requirements imposed upon the Gs of 
the 71 observations which enter into the 


determination of 82; plus the "? observations зоа "A роз 
Nation of 825, or a total of ni + 2 observatione P iei d га peu "d 
Values based on the observations a pocni т” 2 e the ми 1 es 
namely, Y, in the case of 82; and Xa in the case of 92: ] mber 


j © 
We have an estimate of the standard erro 


D= X¥,—Xo which satisfies the 
(12.1). This estimate is based on 


347 


з 
MALL-SAMPLE THEORY 


indi rule given in 
statistic Fg, as indicated by the rule give 
rees of freedom of the statistic ср, 8 | Vae 
Е рн 4, is m+ л — 2. We may now apply (12.1) to the prob 
Section 12.4, is 2 
hand as follows: 


X1— X3) — (m — po) (12.7) 
t(df = nı + n2 — 2) e 1 1 
ni ' na 
А " sis about 
This ¢ may be uscd as a test statistic to test any hypothesis ab 


р : я sis that 
A= uı— ш. If we are concerned specifically with the hypothesis t 
Ha — M2 = 0, we may write (12.7) as follows: 


2)= Fı X, (128) 
t(df =ni + no—2 74821 + n8; (1. 1) 
nid ns—2 \n ! na 


To make (12.8) more self contained, we have also incorporated in it the 
instructions as given in (12.5) for computing 02, : cera 

Example. Consider the psychological problem described in nie 
10.16 and following. We can not validly apply the above t-test v P ad 
the situation of Experiment I since the hypothetical punishment М ‘lity 
no-punishment (NP) populations are clearly quite different in у the 
(see Table 10.4). This difficulty does not, however, appear to me ной 
case of Experiment II (вес Table 10.5). Asan illustration of an applic re 
of (12.8) we shall, therefore, consider а re-run of Experiment 11 invo v3 
respective samples of Seven and five cases from the hypothetical ee’ 
both* (PB) and punish-failures-only (PF) populations. Assume the ¢ 
terion scores for the two samples to be as follows: 


PB sample: 20, 17, 10, 25, 24, 22,15 
PF sample: 26, 31, 23, 35, 20 
STEP 1, Н: А = bee — ppp = 0 


Alternatives: A> 0 and A «0 
ӘТЕР 2. a= 01 


: :« example: 
We previously used 001 as the level of significance, In this example 
however, we shall use hat less Stringent value of .01. 


R: t=—317 and t= + 3.177 


the somew 
Strep 3, 


(Note: df= прв трк — 2 = 7+5-2= 10) 
STEP 4. Calculation of t fo 


НИНННр‏ ب 


*Both successes and failures, 
TSee Table VI, Appendix C. 


r data at hand. 


348 


RY 
SMALL-SAMPLE THEO 


For PB Sample For PF Sample 


UM ХХ = 135 
X219 Y-»7 
DX? = 3,791 
(ZX)?/n = 3,645 
Хх?= 146 
82 = 29.2 
Hence, 
"m 19-27 
(7) (24.57 14) + (5)(29.2) (+) 
745-2 7 5 
= =$ => AD 
7330 ze 


Step 5. Decision. Retain the hypothesis. (Why?) 

We shall not at this point consider in detail the power of 
е of a difference of 8 between the 
е of 2.7 for the data of the origi- 


| Comment. 
this test. It is sufficient to note that in spit 


E ; ifferene 
imple means as compared with a difference 
nal experiment, and in spite of the use of а = .01 instead of .001, the value 


Of ¢ still falls well within the region of acceptance, It is clear that sample 
our small-sample test judges them 


differences must indeed be large before 
of real differences between popula- 


Significant, i.e., judges them indicative 
lion means, 


ed matched or equated pairs. 

as described under Experiment III re- 
tion 10.21). That is, we obtain 
as the following. 


Case IT. Randomly select 


Che situation here is precisely 

di sw 
Barding the psychological problem (sce Sec 
à sample of matched pairs by some process such 
dom from the p 
| variable thought to con 
differences in the criterion variable being studied. | 

Step 2. From among all objects in the population which possess this same 
measured кыне of this control variable, select one at random and 


e object selected in Step 1. | 
1 2 until the desired number of matched pairs is 


Step 1, Select an object ät ran population and measure it with 
[ tribute to individual 


respect to some contro 


pair it with th 
Step 3. Repeat Steps 1 anc 

obtained. “ airs " 
Step 4. By a random process assign the members of the pairs to the two ex- 

perimental groups. 


al directly with pairs rather than individual objects. 


to be the difference (D) between the criterion 
assigned to Group 1 and the criterion 
assigned to Group 2. Thus, if there 
N D-scores from а hypothetical 


iy this design we de 
Ne ip for a pair is taken wok 
SOre с for the member of the m 3 
are ү Хэ) for the member of the ра А 
У pairs, we have a random sample О 


349 


SMA 
ALL-SAMPLE THEORY 


population of D-scores such as might be generated a md 
tion of this selection procedure. As was explained in хрен m vos on 
of the hypothesis that the mean of such a population of a yi ipee 
is equivalent to a test of the hypothesis of no difference pea en cim 
of the two hypothetical populations represented in cach of the pM 

Now if the two hypothetical populations of X-scores are Tu меу 
distributed with respect to the criterion Measure, we know from 0 
that the population of D's is normally distributed with up = ther 
Moreover, this is true regardless of sample size and regardless of w > 
or not the original populations are equally variable, Hence, our T hi 
becomes simply one of testing a hypothesis about the magnitude er t 5 
mean of a single, normally distributed population of D-values. 1 һе о 
involves a straightforward application of the theory and techniques of ме 
preceding section (12.6). We shall, nevertheless, rewrite (12.3) in terms 
the following notation: 

Let N — number of D-values (pairs) in the sample. 

D = the mean of the sample of D-values. 
8p = the standard deviation of the sample of D-values. 
Hp = the mean of the population of D-values. 

"Then (12.3) becomes 


df= N — 1) Pi VN-1 (12.9) 
D 


H H е 
This { may be used to test any hypothesis about up = ш — 2: If w 


B» и - уе 
are concerned specifically with the hypothesis that up = رر‎ — pa = 0, Y 
may write (12.9) as follows: 


df = N — 1) 2 РМТ 


8p 


(12.10) 
Example. Consider a re-run of the psychological problem of poat 
ment III involving eleven randomly selected pairs one member of which b 
assigned to the punish-both (PB) condition and the other member of et 
is assigned to the punish-failure-only (PF) condition. Assume the data 
be as shown in Table 12.3. The solution is as follows: 
бтЕР i Н: up- 0; alternatives: Hp» 0 and up <0 
STEP 2. a= .01 


An а of .001 was us 


ed in Ex 
example, we h 


А ing 
periment III. Here, as in the precedi! 
ave used .01. 


STEP 3. R: t<— 3.17 and t + 3.17% 


(Note: = Nj 
Appendix С. 


*See Table VI, / 


11—1= 10) 


350 


argon 
SMALL-SAMPLE THE 


TABLE 12.3 Criterion Scores and Differences Between Them for 
11 Matched Pairs in Experiment III on the Effect 


of Punishment on Speed of Learning 


[ Bate PB PF D 

1 24 37 —13 
2 20 35 — 6 5р=- 75 
3 19 16 + 3 Dz- 6.82 
4 14 26 -12 5р?= 
5 30 23 + 7 | (SDN = 
6 19 27 = 8 Xd 493.6364 
7 19 30 —11 کو ي‎ 44.8700 
8 20 20 0 p= 6.70 
9 16 28 —12 

10 11 24 = 18 

ji 11 21 


ткр 4. Calculation of t for data at hand. 
Using (12.10) we obtain 


STEPS, Decision. Reject П. (Why?) 

n of the alternative un > 0. 
possibility is ир <0, indicating that 
he number of trials required 


Note that this decision also implies rejectio 


(Why?) Hence, the only remaining Ї 
the PB condition is more effective m reducing t 


for learning than the PF condition. А 
In this example we clearly have a more powerful test than in the pre- 


Ceding example since it "saw" or “interpreted” a D-value of — 6.82 as 
sufficiently different from zero (the value hypothesized) to warrant rejec- 
tion of p as a possible value of ио, Whereas the test of the preceding 
example did not “see” or “interpret” a D-value of — 8 as sufficiently differ- 
ent from zero to warrant such a rejection. As was explained nahe discus- 
Ston relating to Experiment III (Section 10.21), this increase in power is 
due to the decrease in Тр which results from controlling one of the factors 
(in our example, the factor of intelligence) contributing to individual differ- 
ences in learning scores. It should be noted, howev dt, hp vn to keep 
the number of degrecs of frecdom the same in both examples it was neces- 
Sary to employ more subjects in the latter example (ШП puse di 22 
Subjects) than in the former (12 subjects). Had the same number of sub- 


jects (12) been used in both examples, the latter would have involved only 
SIX pairs of subjects and consequently only five degrees of freedom. Unless 


the control variable is extremely effective in reducing the standard error, 


351 


SMa 
MALL-SAM PLE THEORY 


this loss in degrees of freedom may result in a loss in ры would 
negate any gain in power resulting from the Paging pre one aaith 

To provide for a comparison of the powers of these tw 0 н E: iua cà 
large-sample counterparts, the four power curves are shown in к о 
So that the comparisons would be on the same bases throughout, | А ed 
of significance was adopted for the large-sample as well as Y un 
sample tests. Also it was assumed that both populations had the 


Z +8| +12 P" 
T Y 

—14.8 786-42-34 H=0 34 42 8.6 14.8 
^7u,— uy Scale 


FIGURE 12.3 Power curves Jor two t- 


tests and two large-sample tests of the 
cans of two populations 


variance 25, Finally, apropos the tests based on matched pairs, it was r^ 
sumed that the effect of the faetor controlled was such as to make a Th 
per cent reduction in the standard error of the D-sampling distribution. : 
naturally follows that the large-sample Power curves will differ somewhat 
from those shown in Figure 10.12. 

From the power curves shown in Fig 
that the small-sample tests do not offer 
errors unless the true value of ш 


— е ident 
ure 12.3, it is again clearly evide I 
much protection against "Туре 


ч 1€ 
: — H2 — Д differs markedly from e 
hypothesized value of zero. For the t-test based on independent rando! 


samples of 7 and 5 cases, A must differ from zero by 14.8 in order for me 
probability of a Type II error (B) to he reduced to 05: The corresponding 
amount in the large-sample test based on independent random samples 9 
50 is only 4.2, When Matched Samples are used, A must differ from zer? 


352 


¬ THEORY 
SMALL-SAMPLE THEO 


by 8.6 in order to reduce B to .05 in the сазе of a t-test based on 11 pair: 
the corresponding amount in the case of a large-sample test based on 50 
pairs is 3.4. 

The power superiority of the matched-sample t-test over the t-test in- 
volving independent random samples is here due primarily to the fact that 
More subjects were studied. The power curve for a matched-sample t-test 
Involving 12 subjects (6 pairs) is not shown in Figure 12.3 for the reason 
that it is so nearly the same as that for the t-test. based on independent 
random samples of seven and five cases that the two curves could not have 
been distinguished in a graph drawn to this scale. This implies that 
for sumples of the size involved, a 20 per cent reduction in standard 
error js fully offset by the reduction in degrees of freedom from 10 
Qu +ny—2= 10) to 5(N — | = G— 1= 5). That is to say, when dealing 
With samples of this order of size, the matching design would not result in 
Mereased power unless it also resulted in a reduction in the size of the 
Standard error of considerably more than 20 per cent. | 

In concluding this section, it is important to recall remarks made in 
Section 10.21 to the effect that the sampling routine necessary to make this 
Matching design valid in a real-world situation is difficult to achieve, and that 
Y far the most common application of (12.9) is consequently to be found in 
Situations in which the two experimental conditions are such that they may 
both be applied to the same individual, or in situations in which the concern 
'S With the same individual before and after the administration of some 
| In such situations, of course, the 
1 the same individual and (12.9) is 
potheses about the mean of a 


treatment or experimental condition. 
Scores in a pair are both derived fron 
Appropriate as a test statistic for testing hypo 
Population of differences between such pairs of scores. 


ARKS ON THE USE OF ( AS А Test STATISTIC 
Р кес ‚ the price implicit in 
It is j T7 i student appreciate fully nplic 

Не ротора би ана idering this price, however, it will be 


re Const 
BEE 9 theory treated in Chapters 9 and 10 


12.8 CoxerutDING RE 


the ias axe 
10 use of small samples. 
'elpful to compare the large-sample 


With the s ү just described. 
*small-sample theory Jus 4 А ў 
When the йш under test has to do with the value of the mean 


& population either of X-scores OF of D-v pon at nol سیا س‎ 
arge-sample test statistic z is computed by y ced ^ 
le Small-sample test statistic t In other wor А Feat the large Mh sla 
Values of 2 and ¢ will be identical. For a true D A noH к бутчани 
Value ig interpreted as a random variable Ч а j^ ү. se ee pe 
With a Mean of zero and a variance of one. Even i wi та n fr i 
pos idi tself normally distributed, this in erpretation 
an extimate of the standard error of the 
yutation of 2. The t-statistic on 


ig ww) the sample is drawn is i 
Sy 5 pe^ 
s Approximate owing to the use of a 
Nt ч , EA An: © 
pling distribution of X or D in the е 


353 


SM 
ALL-sA MPLE THEORY 


the other hand is interpreted under the same cireumstances asa — 
variable which is distributed as that member of the family of ам i 
which the df-value is the same as that of Ty ог Tp. The theory takes 1 ы 
account the use of the estimates Fx and 0р in the computation of 2, a 
the interpretation is exact. | | "m 

Suppose now that the population fram whjch the sample is « wih ee 
not normally distributed. Then a second source of inexactness enters | м 
the interpretation оѓ z, because of the fact that the Y. (or D sampling м г" 
tribution only tends toward or approaches а normal distribution ах а 
increases. Under this circumstance, the interpretation of t also atoms 
inexact for precisely the same reason. Even so the interpretation of ! foni 4 
sample of a given size is less inexact than that of z for a sample of this stu, 
since the interpretation of z is approximate in character on two counts, 
while that of ¢ is approximate in character on only one. As sample size 
increases, the approximate character of the interpretation of ( which is due 
to the non-normality of the population becomes less and less a matter of 
concern. In fact when N becomes quite large, say fifty or more, the inter- 
pretation of z—an interpretation which is approximate both because of the 
non-normality of the population and the use of an estimated standard error 
—becomes sufficiently accurate to provide a practicable test of hypotheses 
about means. This, of course, is the large-sample theory treated in Chapters 
9 and 10. In other words, this large-sample theory is actually a special case 
of the ¢-theory, the normally distributed z being that member of the family 
of t-curves for which df = o. The approach of the form of the (-distribution 
to that of the edistribution is quite rapid, so that even for df-values 45 
small as 30 the z-distribution provides a useful approximation of the t- 
distribution unless a high degree of accuracy is required. Thus we see that 
large-sample theory as applied to tests of hypotheses about the value of a 
population mean actually amounts to the use of the normal-distribution 
approximation of any t-distribution for which df > 30. 


When the hypothesis involved has to do with the difference between 
the means of two populations and the test is based on the use of independent 
random samples, the situation is altered во 


mewhat owing to the fact. that 
different estimates of the standard error of the X1— Xs si з 
tion аге in general used in computing t and 21 However, in the case 17 
which nı and nz are equal, the two standard-error estimates are the same 
and in this situation, therefore, the remarks of the foregoing paragraphs 
still apply. In general the t-curve model is exact only if the two population? 
involved are (1) normally distributed and (2) equally variable, ‘The normal- 
curve model would be exact only if the populations were (1) normally dis- 
tributed and (2) if their variances were known. Since in practical work the 
population variances will not be known, the hormal-curye model will 1 


pee اال زت ر‎ 
*We have recommended against N < 50, See Section 9 1 
TCompare formulas (9.28) and (12.6). RS 


sampling distribu- 


354 


БОК 
SMALL-SAMPLE THEO! 


a be approximate because of the use of an estimated standard error 
f the populations are not normally distributed, then both curves provide 
1 ale “hie 7 "oxi | 
nodels which are approximate, the normal-curve model now becoming 

g 


approximate on two counts. If the populations are not equally variable 


the t-curve model also becomes approximate on two counts. 


If the popula- 


tions differ markedly in variability, and if nı and пә are large and differ 


substantially, the normal-curve 1 
priate than the f-curve model. 


The situations discussed in the foregoing paragr 


in Table 12.4. 


TABLE 12.4 


model (z-t 


Summary Comparison of t 
with Regard to Characteristic of Exactness 


est) is somewhat more appro- 


aphs are summarized 


and Normal Curves 


Hyporuesis 


Авойт: CONDITIONS 


Population nor- 


n 
mally distributed 


Population non- 


Populatio 
mally dist ributed 
and equally vari- 
able 

Populations non- 
normal but equally 


variable 


Ш — uz 


Populations nor- 
mally distributed 
but not equally 
variable 


Populations non- 
and not 


normal 
ariable 


equally v 


SMALL-SAMPLE THEORY 


{CURVE MODEL 


2-CURVE (NORMAL) 
MopEL 


Exact 


Approximate be- 
cause of: 


normal 
(1) non-normality 
of population 
a dhes 
Exact 


Approximate be- 
cause of: 
(1) non-normality 
of populations 


Approximate be- 
cause of 
(1) inequality of 
population 
yariances 
Approximate be- 
cause of: 
(1) non-normality 
of populations 
(2) inequality of 
population 
variances 


Approximate be- 
cause of: 
(1) use of oF 
Approximate be- 
cause of: 
(1) use of ¥ 
(2) non-normality 
of population 


a 
Approximate be- 
cause of: 
(1) use of FX, - X, 


Approximate be- 
cause of: 
(1) use of GF, Fe 
(2) non-normality 
of populations 
Approximate be- 
cause of: 
(1) use of FX 


Approximate be- 
cause of: 
(1) use of CX, Fs 
(2) non-normality 
of populations 


355 


As previously explained, when samples are large the sg и 
rovided by the normal-curve model are sufficiently accurate for Lan тч 
р ses. Statisticians have given much attention to the accuracy of t 
aeons when samples are small. It is the opinion of ee 
experienced statisticians that no very serious error results f rom the а рр ns 
tion of the t-eurve model in the case of non-normal populations whe ы T 
critical region used is two-ended. At least the degree of control OV ® ; 
Type I error is not believed to be seriously affected, and while some loss : 
power is almost certain to result, such loss as may occur cannot matter muc В 
in view of е fact that these small-sample tests are in any case capable al 
detecting only very gross discrepancies between hypothesis and роне 
with any degree of consistency. When one-ended critical regions are used, 
the t-test is far more vulnerable to the effects of non-normality especially 
skewness. If a one-ended region is required, t-tests should be used with 
small samples only in situations in which it is reasonable to assume that the 
population distribution at least approaches normality 
It is also known that inequality of population variances does not seri- 
ously effect the validity of the t-test of (12.8) so long as the inequality is 
not ‘extreme. There is little point, however, in using such t-tests when it 
is not possible to assume a reasonable degree of equality between the 
population variances, for other test techniques are available which are not 
subject to this restriction.* Thus it is clear that t-test, theory in the case 


of small samples is somewhat restricted in the gener 


and the investigator who uses small 


in form. 


ality of its applicability, 
samples must face the fact that an 
analysis based on this theory is appropriate only when the conditions under 
which it is exact are at least satisfied to the extent indicated 
But the most costly aspect of the use of small samples 
tions in which the conditic 


above. 

‚ even in situa- 
ons necessary to making t-test theory exact are 
satisfied, lies in their extreme lack of power as compared with large samples. 
While the appropriate application of t-test theory to sm 
provide for exact control over a Type I error, it c 

basis of the limited information inherently cont 
detect consistently a discrepancy be 1 
hypothesized for it unless that disere 
II errors are of concern only 
hypothesis becomes very great, then the use 
practicable. In general, however, the use of a small sample is justifiable 
only in situations in which circumstances 
a large sample. 


all samples does 
‘not be expected, on the 
ained in such samples, to 
tween the parameter and the value 
Pancy is very large. Of course, if Type 
when the difference betwe 


'en parameter and 
of small samples may prove 


5 аге such as to preclude the use of 


ion of these techniques is bevond the 
W. G. Cochran and G. М. Cox 


Scope of this text. For an example see? 
John Wiley & Sons, Ine, 


‚ Experimental Designs (Second edition; New York: 


» 1957) pp. 100 102. 


356 


SMALL-SAMPLE THEORY 


12.9 INTERVAL ESTIMATION BASED ON THE -STATISTIC 


Let S be a normally distributed statistic with mean ws and standard 
error os, and let it be required to establish the limits, us and Zs of the 
1007 per cent confidence interval for us (see Figure 12.4). In this situation 
ts C D-—us Ed d 


Os Os Os Os 


12 = 
Hence, c= d= 2} 20s and applying (11.1) we obtain 


иѕ = Si — 2ү/205 
Из = 51 + 220s 


(a) 


where S; is the particular value of S which arises in the case of a particular 
sample. 


os 


C c——- s 4——d——- D S-Scale 


Figure 12.4 Sampling distribution of S 


Now suppose that os is not known but that an estimate of it, 05, is 
Obtainable from the information contained in the particular sample at 
hand. If we use this estimate in place of øs in (a) we have 


из = 8i — 205 (b) 

Bs = Sı + 2720s 
We-cnn no longer elim, of course, that the Hs- and Hs-values given in 
(b) are the limits of a 100% per cent confidence interval. Such a claim is 
Precluded by our use of an estimate of os. If, as we have previously ех- 
Plained (see Section 11.4), the sample is sufficiently large to provide a 
reliable estimate of os, then the application of (b) leads to limits for which 
the actual confidence coefficient is sufficiently close to y to satisfy the 
mands of most practical situations. If, on the other hand, the sample is 
Small and the estimate of os unreliable, the application of (b) leads to 
‘its for which the actual confidence coefficient differs considerably in 
Value from that selected for Y- Therefore, in situations involving small 
Samples, it becomes necessary to alter the procedure. An appropriate 


SMA LL-SAMPLE THEORY 357 


Iteration is easily accomplished if the estimate of gs is сов арргарише : В 
еи куа ‘esse, If оҳ is in fact appropriate to /distribution theory, 
ш (12.1) applies, and we may write "mus 

{мз (for df equal to that of 65) = — aia : 
.. ве C= D — us = yds 
or c=d=ty 20s 


Hence application of (11.1) gives 


Bs = S1 — byt s dene 
Из = S1 + byes (12.11b 


where df for t is that of Fs. If the statistic involved is normally dist ی‎ 
intervals established by (12.11) have a confidence coeficient which is mec 
equal to y since the use of ¢ takes into full account the use of an appropri 

estimate of os. 
ш shall now use (12.11) to write formulas for the limits of the ica 
per cent confidence interval for the mean of a normally distributed popu di 
tion. Here, of course, the statistic represented by S, is the mean, Xi, 
the sample at hand, and 5 is 8 s as given by (9.25). Hence, 


"mE SW 8 (12.128) 
didis c. 
— 8 19.120) 
H=Xi+ ty. Wot ( 
where df=N-1 


Example. Using the data of the ex; 


А " к e 
ample of Section 12.6, establish th 
99 per cent confidence in 


terval for the mean of the 
Here N = 5 so that df — 4, also by = Laos. In the t-t 


C, Р= .500— Y/2. Referring to this table 
= .005, we find | = 4.60, Since 
the application of (1 


population iai. 
able given in Арран 
for df = 4 and P = .500 — 4% 


= 7.18, 
for the give data YN, = 80and$ = 17.1 
2.12) gives 


17.18 
к= 80— (4.60) کله‎ 
= sie М5— 1 


= 17.18 
H= 80+ (4.60) Ve = 80 + 39.51 = 119.51 
Б 


= 80 — 39.51 = 40.49 


Comment. It will be observed that th 
IQ units as compared with 12.9 1Q units 


large-sample estimate (see Example 18 


- 2 

e length of this interval is 70,0" 

in the case of the corresponding 

ection 11.4), and again we have 

rather striking evidence of the lack of precision of small-sample results a 

compared with the precision yielded by more informative large samples 
It is also important to note that the procedure Just illustrated is appropria! 


nY 
SMALL-SAMPLE THEO 


in the case of small samples only with reference to normally distributed 
populations, 

We shall next write formulas for the limits of the 100y per cent con- 
fidence interval for the difference between the means of two normally dis- 
tributed and equally variable populations. Here the Sı of (12.11) is the 
obtained value of D; = Ху — X» and Gz is as given by (12.6). Hence, 


_ nis] + nes?2/1 , 1 5 
A= Dı ¬ tye iu Lai (2+2) (12.13а) 
= = nisi + пә?» d A: 9 
Х=Л+һе a3 a Ф (12.13) 


Where df= nı +ing— 2 

Example. Using the data of the example under Case I in Section 12.7, 
Obtain the limits of the 99 per cent confidence interval for the difference 
between the means of the populations involved. Here npg =7 and npr = 5 
80 that df= 74-5—2210. Also, ly2 = Lays. Referring to the ¢-table for 
df= 10 and for P= .500 — 495 = .005, we find t= 3.17. Since for the 
Biven data 7 = Хов Хр 19 — 27 = — 8, and 8ps and 8 pr are 
respectively 24.5714 and 29.2, the application of (12.13) gives 


(+5) 


7) (24.5714) + (5) (29.2) 
7+5-2 


= — § — 10.46 = — 1846 

A =— 84 10.46 = + 2-46 

ant to note that these limits are opposite in 

SIE. "Phe negative sign associated with the lower limit indicates a difference 

favor of the PB condition while the positive sign of the upper limit indi- 

Catos a difference favoring the PF condition.* The fact that these limits 
. vious finding (Section 


ie on ‹ ; sistent with our pre 
Opposite sides of zero is consisten i ; 
ә =, 2PPOsite sides of zer iot be rejected. It is also important 


=?) that t] eei i 
: 1¢ zero hypothesis co ot oui e 
Student to note that the applicability dari prot = ris t к 
S limite, mally distributed populations but also to equally 
Vari d not only to normally dis 


' 
Comment. It is import 


able pc à 
т opulations. mantic’ S sita 
"inally we shall consider the problem of determining the limits of the 
l00y per e 4 "nf ^ in^ terval for the mean of a normally distributed 
cent confidence mterve ii a mm " 
Population [ differences resulting from forming à random sample of 
* of differences re: of GUN; (12.12) applies. Neverthe- 
f the notation previously de- 


i боса that j ical experiment here involved, the smaller criterion scores 
"dicated ga 11 (һе psychological expel, rapid learning. 


e S : ex 
© Superior performance—i.e., m 


8% 359 


TA 
LL-g, MPLE THEORY 


Šp 


PEUT = fyz = (12.14а) 
ex ni SD D 

= D+), و‎ (12.14b) 
m it ҮСҮ 


df = N — 1 where N = number of pairs 


Example. Using the data of the example under Case II in Section 
12.7, obtain the limits of the 95 per cent confidence interval for the mean 
of the population of differences involved. 

Here, N = 11 so that df= 10. Also уо = tars. Referring to the table 
for df= 10 and for P —.500— 475 = 025, we find (= 2.23. Since. for 


the given data D; = — 6.82 and sp = 6.70, the application of (12.14) gives 


Ho = — 6.82 — (223) 9-20. 
Lt 
= — 6.82 — 4.73 


=— 11.55 
Bo =— 6.82 + 4.73 
=— 2.09 


360 


-ORY 
SMALL-SAMPLE THEOR 


CORRELATION 


13.1 INTRODUCTION TO THE CONCEPT OF CORRELATION 


Te ur dice for cach of à number of individuals or objects we have 
cach of ` scores) of two characteristics OF dimensions. For example, for 
or for aoe of squares we may have measures of perimeter and side, 
Weight, pty a number of twelve-year-old boys measures of height and 
School achi "- each of a number of college sophomores measures of high 
Concerned + чала! and freshman-year college achievement. We shall be 

rere with the tendency for the pairs of measures to correspond 
to have the same relative position in their 


in ro 
we shall be concerned with the 


n a, -that is, 
“Чон, eck DBAS. In other words, 
elow Which individuals or objects which are average, above average, or 

also to be average, above average, or 
ision. We shall refer to such 


respe 


average i с : 
average in one dimension tend 


Clow : 
y in the other dime 


tesi respectivel 
Hoan as correlation. | 

ап байа ^ а the correlation between the aes era squares is 
mes m | of perfect correlation. Since the иШ "i i any square is foür 
ion of «e ength of its side (P = 48) it obviously las that in any collec- 
ла one a, the one with the largest side will have the largest perimeter, 
and so ik ith the second largest side will have the second largest perimeter, 
1. The correlation between heights and weights of twelve-year-old 


Os 

go OE Hd J [MES Р? ap С o 

ion qq n the other hand, is not perfect. It is a matter of common observa- 
! "VS who are average, above average, or below 


perimeters and 


at twelve-year-old bo 


361 


Co 
RRELATION 


average in height lend to be average, above average, or below маа 
respectively, in weight. Yet, exceptions to this tendency ire not unco г 
mon, and it would not be at all unusual to discover in a given collection j 
twelve-year-old boys that the tallest was not also the heaviest. Ihe 
tion is, of course, similar in the case of measures of high school achievement 
and freshman-year college achievement. 

Dimensions or characteristics exist between which there is no per- 
ceptible correlation. This is the case, for example, with a measure of mne 
ligence such as IQ and a measure of some physical dimension such as height 
for a population of fifth-grade boys. Such variables are sometimes said to 
be uncorrelated. Uncorrelated dimensions or variables are characterized by 
the fact that large, small, or average values of one occur with the same 
relative frequency with all values of the other. 

Though still other situations exist,* we shall at this point refer to only 
one, namely, the tendency for individuals who are above average in one 
dimension to be below average in the other, while those who are below 
average in the first tend to be above average in the second. Such dimensions 
are still said to be inversely correlated, rather than directly correlated, as 1n 
the situations first described. For the children in the seventh grade of almost 
any elementary school, for example, chronological age and scholastic ability 
are likely to be correlated inversely, that is, the ov 
grade are usually among the dullest 
usually among the brightest. 
have been retarded and the 


er-age children in the 
‚ While the youngest children are 
This follows from the fact that dull children 


bright children accelerated in their school 
progress. For a reason which will become apparent later, statisticians refe! 
to variables which are correlated inverse 


to variables which are correlated directl 


Suppose that the “objects” under consideration consist of a number of 
trips between two cities, A and B, which are 100 miles apart and that the 
two dimensions involved are time required and rate (miles per hour) 
traveled. Again we have an example of perfect correlation. Here, however, 
the correlation is negative instead of positive as in the case of perimeters 
and sides of squares. Since the time of any trip is 100 divided by the rate 
(t= 100/r), it obviously follows that the trip for which the time was great- 
est is that for which the rate was least, that the trip for which the time was 
second greatest is that for which the Tate was second least, and so on. Thus, 
a perfect negative as well as a perfect, positive corre] ' 

In statistical work we actually have no concern at all with dimension 
or variables which are perfectly correlated either positively or negatively: 
Our concern instead, has to do entirely with variables which only tend to 
correspond (either positively or negatively) in relative magnitude—that і 
with variables which are not Perfectly correlated. Жеп ы, the 


ly as being negatively correlated and 
y as being positively correlated. 


ation may exist. 


*An example of another will be presented later, (See Table 13.13. ) 


362 


CORRELATION 


correlation problem in statistics is one of assessing the degree to which 
imperfectly correlated variables are correlated. We need some means of 
answering such questions as the following: 


1. Is the correlation between height and weight for twelve-year-old boys greater 
than that for adult males? 
2. Which of the following variables is most closely correlated with first-year 
college grade-point average? 
(a) High school grade-point average. 
(b) Rank in high school graduating class. 
(c) Intelligence as measured by an individual test such as Wechsler’s. 
(d) Intelligence as measured by some group test such as the Henmon-Nelson 
Tests of Mental Ability (for grades 9-12).* 
(e) Performance on high school tests of general educational development 
such as the ITED battery. 
З. To what extent is success as an office secre 
on some test of English grammar? 
4. To what extent are the weekly sales at a grocery store correlated with 
for newspaper advertisements? for radio advertise- 


tary correlated with performance 


weekly expenditures 
ments? 
The situations represented in these questions are, of course, but a few 
of the many which call for some assessment of the degree to which im- 
Perfectly correlated variables are correlated. Some quantitative index of 
degree of correlation would obviously be most useful. This chapter is 
Primarily concerned with the development and interpretation of such an 
Index, 


132 THE SCATTER DIAGRAM 


ative index of correlation we shall consider 
the degree of correlation between two 
scatter diagram or dot chart, does not 
ferred to in the foregoing section. 
] presentation of a given cor- 


Before we present a quantit 
a scheme for displaying graphically 
Variables. This device, known as the 
Provide the needed quantitative index reterr 
tt does, however, provide for a simple pictoria 


TABLE 13.1 


Perimeters (P) and Sides 
(8) of 10 squares 


ECC a 

m E 

we Henmon-Nelson Tests of Mental Ability, 
Assachusetts, 

en. Tests of Educational Developmen 

wa, Iowa City, Iowa. 


Houghton Mifflin Company, Boston, 


it, Iowa Testing Programs, State University of 


363 


CORRELATION 


relational situation which may be readily understood, even by one not 
technically trained in statistics. 

Table 13.1 gives the perimeters and sides of a collection of 10 squares. 
Figure 13.1 shows the scatter diagram for these 10 pairs of values. Figure 
13.1 was constructed by marking off P- and S-scales along rectangular 


Figure 13.1 Scatter dia- 
gram of perimeters and 
sides of ten squares 


P-Scale 


S-Scale 


values on their res oe Pod d perpendicularly from the individual 
The perpendicul spes. ve scales. It is customary to show only the points: 
шаге are shown in Figure 13.1 only for the pair of dimensions 


TABLE j 
13.2 Heights (H) and Weights (W) of 50 Randomly 
Selected 12-Year-Old Boys 


Н у 

n й w П ig I w m w 
67 1 i 5 E m 

m owja |a иа юю п 
б |a an | S | & ощ 
m mix ОЕ ЧЕ: 55 75 
ЕА m 58 109 57 78 Б и 
89 | 58 102 57 77 Ыш 

62 195 
62 121 E > i и ie E n 
62 118 т | 7 ж Wu 
62 104 a 1 es i 57 T ы ы 
62 % | 59 P AE TO 8 som 
: 98 57 95 560. 79 52 58 


CORRELATION 


of the first square of Table 13.1 (see broken lines). It is clear that in this 
case involving perfectly correlated variables the points representing the 
pairs of values are arranged in a straight line. While not all perfectly cor- 
related dimensions follow this straight line pattern (curved line arrangements 
are possible) it is clear that whenever pairs of dimensions do follow this 
pattern they must necessarily be perfectly correlated. That is, the largest 
value of one will always be associated with the largest value of the other, 
and the second largest of the one associated with the second largest of the 
other, and so on. 

Consider next Table 13.2 which contains pairs of height and weight 
scores for 50 randomly selected twelve-year-old boys. The heights and 
Weights are given to the nearest inch and pound. The pairs of values have 
been arranged in order of height and within a given height in order of 
Weight. The scatter diagram for these pairs of heights and weights is shown 
in Figure 13.2. There is clearly a tendency for the height and weight scores 
to be positively correlated. However, the correlation is not perfect and the 
points corresponding to the pairs of values no longer fall ona st raight line. 
Yet, the points do tend to fall or scatter about such a line in a sort of 
elliptical pattern. (Sce Figure 13.2.) The more nearly the arrangement 


140 
130- 


120 


Weight in Pounds (W) 


a N 
eo o 
1 


a 
о 


40 50 55 60 65 70 
Height in inches (Н) 
Fioure 13.2 Scatter diagram of heights and weights of 50 twelve- 


year-old boys 


365 


CORRELATION 


TABLE 13.3 Scores on Tests of General Mathematical Ability 
(M) and Reading Rate (R) Made by 50 College 


Freshmen 
М R M R М R М R M R 
59 52 54 56 51 26 49 53 47 10 
58 49 53 66 50 ö7 49 49 47 41 
58 38 53 60 50 53 49 48 47 37 
57 60 53 58 50 50 49 42 47 36 
56 62 53 54 50 49 49 33 41 35 
56 61 53 36 50 46 48 58 47 33 
56 55 52 53 50 45 48 45 46 55 
56 48 52 51 50 42 48 39 46 52 
55 51 52 48 50 36 47 62 44 41 
54 58 52 46 49 59 47 55 41 39 
م‎ dd oe. o oM >= | cca 


R-Scale 
re 
a 


30 55 60 
M-Scale 


40 45 


FIGURE 13.3 Scatter di 
B r diagram of scores on tests of general mathematical 


N 
CORRELATIO? 


of the points in such a scatter diagram follows a straight-line pattern— 
that is, the narrower the elliptical field—the higher is the degree of correla- 
tion between the variables involved. 

The scatter diagram for scores made on tests of general mathematical 
ability and reading rate by 50 college freshmen (see Table 13.3) ix shown 
in Figure 13.3. Comparison of Figures 13.2 and 13.3 clearly shows that 
these mathematical-ability and reading-rate scores are not as highly cor- 
ight and weight scores for twelve-year-old boys. The 
elliptical pattern of dots in Figure I: is much wider than in Figure 13.2. 
That is, the dots are more widely scattered away from the line. This 
implies that at least in some instances large .M-scores must be paired with 
relatively small R-scores and small M-scores with relatively large R-scores. 

Figure 13.4 shows the scatter diagram for reading comprehension test 
scores (2) and heights (//) in centimeters of 50 fourth-grade pupils (see 
Table 13.4). It is clear that for these pupils there is virtually no correlation 
between these variables. Large, medium, and small height measures are 
all associated with reading scores of any magnitude. The dots clearly are 
Not arranged along a line in an elliptical field but rather are scattered about 


related as аге he 


145 


30 35 


R-Scale 


cores on а reading comprehension test 


Figonk 13.4 Scatter diagram of < 
Е 166 eu for 50 fourth-grade pupils 


(R) and heights (H) in centimeters ) 


367 


CORRELATION 


ing С "chension Test (R) and 
ў у Scores on a Reading Comprehension Me eters 
Нее Height (Н) in Centimeters of 50 Fourth-Grade 


Pupils 
H 
R H R H R H R H R 
3 
30 137 24 134 21 141 18 141 14 
30 129 24 130 21 136 18 130 14 
29 132 23 139 21 132 18 124 14 
29 127 23 133 20 139 17? — 139 14 A 
28 136 23 129 20 135 17 133 14 2 
: : 39 
9; 1388 22 142 20 134 16 143 13 n 
26 126 22 140 20 130 16 135 ЇЗ ee 
25 132 99. 128 20 18 16 130 1 jd 
24 138 22 12% 19 и 15 134 11 fs 
24 136 22 125 19 134 i 8l 9 : 


in an area bounded by a circle. Thus we see that the scatter diagram Lied 
vides a graphical technique for indicating the degree of correlation P 
a pair of variates. Variates for which the correlation is high result а 
diagrams in which the field of points is a narrow ellipse. As the cor 9 E 
decreases, these elliptical fields widen, becoming circular when there is 
complete lack of correlation. ^ "ad seré 

In all the foregoing examples, save the last, the variates involved v {к 
positively correlated. That is, large, medium, and small values m "i 
variable tended to be associated respectively with large, medium, and ee 
values of the other. It should be obvious, however, that the scatter ge 
functions equally well when the variates are negatively correlated. i 
only difference is that the line about which the points scatter will slope 
downward to the right instead of upward to the right. This, of вой 
results from the fact that when variables are negatively correlated larg 
values of one variable tend to be associated with small values of the other. 

We shall present here only one example of a scatter di 
negatively correlated variates. Suppose we h 
(geometric figures, color patches, ete.), Я 
designated A and В and those of the second pair C and D. Now харрот 
that the task is to decide whether A and B or C and D are the more age 
Psychologists* have demonstrated that the actual degree of difference |! 
similarity (i.e., dissimilarity), physically assessed, is negatively correlate 
with the time required for a subject to make a decision, that is, perform 
the task. Table 13.5 contains physically assessed dissimilarity scores a! 
decision-time score 


М 8k i 2 
8 for forty decisions, that is, forty independent pt 


"ü r 

agram Lp 

| i jects 

ave given two pairs of objt E 
Let the objects of the first рай 


*For example, 


see William N, Dember, 
Similarity,’ 


“The Relatio 
' Journal of Experimental Р. 


" id jus 
n of Decision-Time to Stim 
chology, Vol. 


53 (January 1957), pp. 68-72- 


368 


CORRELATION 


TABLE 13.5 


Physically Assessed Dissimilarity Scores (D) and 


Decision-Time Scores (T) for 40 Decisions by a 


Single Subject 


: 
D T D T D T D т | 
68 32 56 53 49 50 41 46 
67 36 50 46 49 43 41 44 
64 41 524 54 49 42 40 70 
64 28 84 852 47 60 38 02 
02 44 54 49 40 65 38 0 
Û1 40 5438 45 ы 38 58 
60 52 52 56 45 49 38 50 
50 5 Bl' 47 4$ 45 35 — 70 


T-Scale 
[^] 
© 


50 55 60 65 70 


D-Scale 


30 35 40 45 


75 


Figure 13.5 Scatler diagram of 40 pairs of dissimilarity and 


decision-time scores 


CORRELATION 


369 


formances of the task by a single subject. The objects involved were patches 
of gray of varying degrees of luminosity, and decision-times were measured 
to the nearest .01 of a second. All measurements were converted to T- 
scores (see Section 8.12). The scatter diagram for these data is given in 
Figure 13.5. "The degree of deviation from a straight line appears about the 
same as for pairs of height and weight seores for twelve-year-old boys (see 
Figure 13.2). Here, however, the elliptical field slopes downward to the 
right, indicating that the correlation is negative. 


13.3 THe BIVARIATE FREQUENCY DISTRIBUTION 


It is not unusual, particularly with large colleetions of data, to find a 
pair of scores both members of which have the same magnitude as the cor- 
responding scores of one or more other pairs in the collection, This gives 
rise to a difficulty in the preparation of a scatter diagram because of the 
fact that only one dot ean occupy any given position. This difficulty may 
be circumvented by the use of a bivariate frequency distribution or table. 


DEFINITION. A bivariate frequency distribution is a scheme for the joint 
presentation of pairs of scores made by the same individuals on two variates 
which shows the frequencies with which the individuals are distributed among 
all possible pairings of the score values for the two variates. 


By way of illustration we shall first consider a simple hypothetical 
example. Suppose that one of the variates (X) involves the values Т, 2. 3 
4, and 5 and that the other variate (Y) involves the values 4, 5, 6, and T. 
There are, in all, 20 (i.e., 5 times 4) possible ways in which these X- and Y- 
values may be paired. The simplest and most compact method of display- 
ing these 20 possible different pairings is provided by a two-way or double- 
entry table, the columns of which correspond to the different values of one 
variate (X) and the rows to the different values of the other variate O) 
Any given pairing is then associated with that cell of the table formed by 


A Т 
دا‎ ы DN ER > 

TABLE 13.6 fee @ 
2 6 3 5 2 9 

2 4 4 5 3 Û 

Thirty Pairs of H ypothetical Scores 5 6 3 5 4 5 
on Variates X and Y 4 6 8 6 5 7 
з 5 3 5 4 6 

3 5 1 5 4 6 

4 6 3 5 z $ 

1 4 2 5 1 4 

2 5 4 6 з 6 


370 


CORRELATION 


the intersection of the column and row corresponding to the values involved, 
and the frequency with which this pairing occurs in the collection is entered 
in the cell. Table 13.6 contains 30 pairs of hypothetical X- and Y-values. 
The bivariate frequency distribution for these 30 pairs is shown in Table 
13.7. In classifying the pairs it is convenient to make a tally mark in the 
cell for cach pair falling in it, and then simply count the marks to determine 
the frequency for each cell. Ordinarily, of course, only the frequencies are 


TABLE 13 y Bivariate Frequency Distribution of 30 Pairs of 
| Scores of Table 13.6 


X-Variate 


Shown.* It is clear from inspection of Table 13.7 that a bivariate frequency 
distribution may be interpreted as a scatter diagram. It should also be 
noted that if the cell frequencies are summed by columns and rows, we 
obtain the two ordinary single variate frequency distributions for the X- 
and Y-scores (see lower and right margins of Table 13.7). Ina bivariate 
table these single variate distributions are sometimes referred to as marginal 
distributions, А " , 

If the range of values of either or both variates is large, it may be 
desirable to let the columns and rows of the double-entry table correspond 
to intervals along the score scales. This results in a "grouped" bivariate 

requency distribution analogous in character to the grouped frequency 


distributions described in Chapter 2. Table 13.8 presents a bivariate 


frequency distribution of this type. The pairs of scores involved are the 
qose by five-col 

"It is sug, he student set up for himself а four-row by frve-column table and 
independently cleanty the pairs of scores given in Table 13.6. The result should, of 


“ourse, be checked against Table 13.7. 


371 


CORRELATION 


height and weight scores given in Table 13.2. The height scores have been 
grouped or classified into intervals of 2 units and the weight scores into 
intervals of 10 units. 


TABLE 13.8 Grouped Bivariate Frequency Distribution for 
Height and Weight Scores of Table 13.2 
- 
Heights 
995253393935 
S 3 з 8 S YY 3 3$ fwo=H 
130-139 
120-129 
110-119 
100-109 
$ 
5 90-99 
= 
80-89 
70-79 
60-69 
50-59 
fH 2 6 1 n 7 9 2 |5 
@=W 645 69.5 76.0 86.3 95.9 104.5 119.5 | 


(Note: The dots locate the means 
means of the heights in each row. 
and may be ignored at this point.) 


H n i һе 
| of the weights in each column and the circles e 
These means are the subject of subsequent comme 


13.4 AN INDEX or ConnELATION 


Although it is possible to obtain some notion of the de 
between two sets of measures by inspection of a se 
formation thus obtained is not usually precise en 
purposes. In this section, therefore, we shall consider the problem of defin- 


ing a particular quantitative index of the degree of correlation between tw? 
sets of measures for the same individuals 


gree of correlation 

atter diagram, the 1 
ative 

ough for comparativ 


372 


CORRELATION 


Table 13.9 shows how the sums of the products of two sets of hypo- 
thetical measures vary with changes in the order of some of the sets. The 
B-values are exactly the same as the A-values so that A and B are obviously 
perfectly correlated. The C-, D-, Е-, and F-values are the same as the B- 
values except that they are arranged in different order and, hence, bear 
differing degrees of correlation with the 4-values. The C-values clearly 
tend to be positively correlated with the A-values, whereas it is difficult to 


TABLE 13.9 Sums of Products of Pairs of Hypothetical Scores 


detect any systematic correlation whatever between A- and D-values. The 
E-values tend to be negatively correlated with the A-values, and the P- 
Values are perfectly correlated negatively with the A-values. It will be 
observed that the sums of products of the AB, AC, AD, AB, and AP pairs 
decrease as the correlation shifts by degrees from perfect and positive to 
Perfect and negative. This suggests that a useful quantitative index of 
correlation may be based on the sum of the products of the pairs of values 
involved. 

There are two reasons, however, | Xt Us | 
rectly. In the first place the magnitude of this sum is affected by the unit 
Consider, for example, a set of D'-values which 


why we cannot use such a sum di- 


9f measurement involved. : : EAM 
Measure the same characteristic as the D-values but in terms of a different 
unit. Let this unit be one tenth that of the D unit; then D = 10 D. That 
is, an object having a D-value of 5 will have a D -value of 50. The D 
values corresponding to the D-values of Table 13.9 are, therefore, 50, 90, 
130, 10, and 70 respectively, and the sum of the pom AD pairs is 
2,490, a value far larger than any sum shown in Table 13.9 in spite of the 
fact that there is no clear indication of | 
the pairs of A- and D'-values. This sum is clearly not comparable to the 
sums of Table 13.9 because of the difference in units involved. Hence, if 
an index of correlation based on the sum of the products of the pair values 
is to be useful for the purpose of comparing the degree of correlation bet ween 
different sets of variates, some way must be found of expressing the pair 
Values of the different sets in terms of comparable units. This is accom- 
Plished by simply expressing all original or raw-score values as z-scores by 


any correlation whatever between 


373 


CORRELATION 


application of (7.1) or (7.2). For example, the mean and standard deviation 
of both the A- and D-scores of Table 13.9 are 7 and 4 respectively, and the 
mean and standard deviation of the D’-scores are 70 and 40. Applying (7.2) 
we obtain the A, D, and D' z-scores shown in Table 13.10. The D and D' 
z-values are seen to be identical and obviously the AD and AD’ z-score 
products must also be identical. 


TABLE 13.10 z-Scores for the A- and D-Values of Table 13.9 and 
for D'-Values where D' = 10D 


Za Zp 24 2p 21 Za Zp 
+15 —0.5 — 0.75 — 0.5 — 0.75 
+ 0.5 + 0.5 + 0.25 + 0.5 + 0.25 

0 + 1.5 0 + 1.5 0 
—0.5 — 1.5 + 0.75 — 1,5 + 0.75 
— 15 0 0 0 0 

> + 0.25 + 0.25 


But sums of products cannot always be used directly as an index of 
correlation for the simple reason that the magnitudes of such sums depend 
in part on the number of pairs upon which they are based. Any direct 
comparison of sums of products, would, therefore, require that the sums 
compared be based on the same number of pairs—a completely impractical 
restriction. This difficulty is easily overcome. It is necessary only to use the 
product per pair, or mean product, instead of the total product. Thus we 
arrive at the following definition of an index of the correlation (r) between 
pairs of measures X and Y, for the same individuals or objects. 


b. » 
LEXY 


N 


Tes 


This index was created by an English statistician, Karl Pearson, and 1 
c d as 2 pii ' h У 
know п аз the Pearson product-moment correlation coefficient. It is also 
variously referred to as a Pearson т, a simple r, or an ordinary r. 
Table 13.11 shows E Rd ; 3 "s 
e hows the pairs of z-y alues, their produets, the sums of the ir 


products, and the values of r for each of the hypothetical sets of scores of 
Table 13.9. ў 


13.5 боме PROPERTIES or r 

: ; "а T5 6 

Р Tf in пр of X's and Y's the z-score values are the same, the СО!" 
^ d 13 k IV ously perfect and Positive. In this situation the sum of the 
z-score products becomes the sum of the Squares of a complete set of z-score. 
But (7.7) indicates that this sum must necessarily equal the number of 


374 


том 
CORRELATIO 


co + y+ AR 4 

SOO + e ¢+ a 
ETF Got 0 0 erot ¢0- cet ст— рее 
vot 81+ ero st LO Tr- с0+ €0— 90 — 

0 0 0 SI 0 0 0 0 0 
gu - gr Sot FOF SLO FIt eot со+ €o-c 
ст 0— С $ü— ого 60+ Sot CTH erc 

IV Fi AV Я av q 2V 9 gv Я Ү 


GEL 2140,1, U1 UIA) sono Д 24098 
Jo Spay tof smy 4-4 pup SUNY ‘sjonpotg 'sa4008-z 


IUEL atava 


375 


CORRELATION 


z-scores (i.e., pairs) involved and in this situation the coefficient r therefore 
assumes the value + 1 (see r for A and B in Table 13.11). That is, if 
zx, = zy,, then 


Moreover, if in each pair of V's and Y's the z-«cores have the same 
absolute values but are opposite in sign, the correlation is perfect and nega- 
tive. In this situation r assumes the value — 1 (see r for A and F in Table 
13.11). That is, if zx; = — гүү, then 


Z22; 


N 


Now consider a collection of pairs of measures for which the correlation 
is positive and high but not perfect. "This is equivalent to saying that most 
individuals (pairs) which are above the mean on one measure are also above 
the mean on the other, or that only a relatively few are above the mean on 
one measure and below the mean on the other (кес scatter diagram for 
r= .9, Figure 13.6). In this situation most of the pairs will consist of either 
two positive or two negative z-scores so that most of the products will be 
positive. Moreover, since the correlation is high, many of these positive 
products will be quite large, because high z-scores for one variate tend to 
be paired with high z-scores for the other, and low z-scores (large negative) 
for one variate paired with low z-scores for the other. For the entire collec- 
tion the sum of the positive products will greatly exceed the sum of the 
negative products, and, hence, the over-all algebraic sum of products will 
be positive. Of course, since the correlation is not perfect this sum will 
necessarily be some value less than N so that r, the 
will be some positive value less than 1. 

Next, suppose the correlation is positive but. low. This means that, 
while again most individuals above average in one measure are above 
average in the other, and vice versa, there will now be a larger number of 
instances in which individuals above a od 
average on the other (see seatter di 


mean z-score product, 


verage on one measure are bek 
agram for r= .3, Figure 13.6). There 
will also be fewer large products, since individuals with extreme z-scores 
(either high or low) on one measure will seldom also have extreme z-scores 
on the other. Hence, while the sum of the pos 
still exceed that of the negative z-scor 
net sum to be as large as when the со! 
mean z-score product will be 
lationship. 

Next, consider the c 


itive z-score products will 
€ products, we would not expect the 
relation is high. In other words, the 
smaller for low than for high degrees of re- 


ase of unrelated measures. To say that two sets 
of measures are entirely uncorrelated for a given collection is to say that 
individuals above (or below) average on one measure are equally likely (0 
be above average, average, or below average on the other (see scatter 


376 


CORRELATION 


diagram for = 0, Figure 13.6). For the entire collection, then, the number 
of positive z-score products will be approximately equal to the number of 
negative z-score products. Also the individual products will tend to be 
small, since two extreme z-scores will seldom be found in the same pair. 
Moreover, the sum of the negative products will tend to be approximately 
the same size as that of the positive products, so that the algebraic Suni 
for the entire collection will approximate zero. Hence, in this situation the 
value of r will be close to zero. 

Finally, it should be apparent that if the correlation is negative—that 
is, if most individuals above average on one measure are below average on 
the other — then the z-score product for most pairs will be negative in sign 
(sce scatter diagrams for r= — .6 or r=— .9, Figure 13.6). The algebraic 
sum of products, and hence the mean z-score product, г, will now be nega- 
tive, and the absolute magnitude of this mean product will depend upon 


the degree of relationship. 
We may now summarize as follows: 


(1) r will be positive when the correlation is positive and negative when the 


correlation is negative*; 

(2) к= + 1 when the relationship is positive and perfect, and r= — 1 when 
the relationship is negative and perfect; 

(3) When there is а complete lack of relationship, r= 0]; 

es between = 1 and + 1 for intermediate degrees of cor- 


(4) r will assume valu 
igher or closer the corre- 


relation—the larger the absolute value of r, the h 
lation. 

We shall later demonstrate that r is not directly proportional to the 
degree of correlation. "That is, r= 3 does not imply "half" as close a cor- 
relation ax r= .6. That this is true is obvious from a comparison of the 
Scatter diagrams of Figure 13.6 to which we have made previous paren- 
thetical reference. These scatter diagrams were developed to help the 
student gain a sort of visual conception of the degree of correlation indicated 
by values of various magnitudes. It is clear from this figure, for example, 
that the change in the "eloseness" of the correlation is much greater from 


T. tor=.9 than from r= .1 to r= E 


13.6 LINEAR AND CURVILINEAR TYPES OF CORRELATION 
Up to this point in our discussion we have ignored the possibility of 
Curvilinear types of correlation. In other words we have concerned our- 
———— 
*In fuet, it is this c veteristie of 7 which has resulted in the use of the terms positive 
and negative to de the type of correlation in statisties rather than the terms direct 
and inverse which are a the mathematics of variation. 

Note that this is not to say that ifr x complete lack of correlation must always asia 
We shall later show that under ¢ ees, r may be zero even if the relation- 
Ship is perfect. r = 0 is a necess sient condition for a complete lack of 


Correlation. 


ribe 1 
commonly used ir 


n circumsts 
v but not a suf 


377 


CORRELATION 


Figure 13.6. Scatter diagram of pairs of z-scores for selected hee? 


378 


TION 
CORRELATIO 


Figure 13.6 (Continued) 


CORRELATION 


selves solely with pairs of variables which tended to be directly propor- 
tional, that is, to be linearly related. All the seatter diagrams or bivariate 
frequency distributions thus far presented reveal this tendency by the 
elliptical configuration of their respective fields of points. Before consider- 
ing curvilinearly correlated variables we shall indicate more precisely what 
is meant by a linear or rectilinear type of relationship. 


Y-Scale 


X-Scale 
Figure 13.7 Scatter diagram of 37 pairs of hypothetical X- and Y-values 


Р ЕА т Hg shows the scatter diagram for a set of 37 pairs of hypo- 
е ве ps yaar; | In a loose sense the dots fall into a roughly elliptical 
pattern. 1e solid line is the major axis of the ellipse and corresponds tO 


*In this secti ге 2 i 
th ee le, me concerned only with establishing a rough notion of the general 
ا‎ er € ees eae would arise in the case of mples from populations 
сейн : оте Шр is linear as compared with those arising from samples taken 
iS no e qn EN ie s the relationship is curvilinear. Actually the elliptical pattern 
cells of equal Srey i à ed oo of the contour lines formed by connecting 
un А Е arate frequency table. If a bivari " y table 
ere Sane se ako : y table. a bivariate frequency 
tween two чаша усе irom à population in which the relationship Бе” 
ables is * contour lines connectir A al fre- 
5 : = E ne в. (cells qua 
Vue di dum we might think of these contour lines as RS S ^ lls) of eq орага 
۴ ч es ns is "попе A 
same) would dana iis es eae points at which the Багушару i 
ва d tend to be elliptical. It is really thes ^n d pre 
ence when we speak of an elliptica] atten у these contour lines to which we have refer 


380 


CORRELATION 


/ 
[ 


the lines shown in Figures 13.2, 13.3 and 13.5. In our previous discussion 
we have more or less implied that this is the straight line about which the 
points tend to be scattered. Actually, this implication is incorrect—or, at 
most, true only in a very crude sense. Consider, for example, the four F- 
scores associated with the largest X-value (X = 6). Of these four scores 
three are below this line. Similarly, three of the five Y-scores associated 
with X — 5 are below this line. The Y-xeores associated with small X- 
values, on the other hand, tend to lie above this line. In fact, the only 
subset of Y-scores which is symmetrically distributed about this line is that 
which is associated with X = 3. Obviously there must be some line which 
"fits" the Y-sceores better than does this major axis. If we are to describe 
the Y-scores as tending to be scattered about a s raight line, it would ap- 
pear, then, that a better line to use would be the one which, if it exists, 
passes through the means of the subsets of Y-scores which are associated 
with the same X-score. For the subsets of such scores in the collection of 


Figure 13.7 these means have been indicated by the open squares and the 
line to which we refer is the one determined by them. 

While this line obviously "fits" the Y-scores better than the major axis, 
it does not fit the N-scores as well. Consider, for example, the subset of 
A-scores associated with a Y-score of 4. Two of these 6 scores are to the 
right of the major axis but only one is to the right of the line determined 
by the means of subsets of Y-scores. This suggests that the trend of the 
‘cores ean best be represented by a still different line. For this purpose we 
shall use the straight line, if it exists, which passes through the means of 
the subsets of X-scores which are associated with the same Y-seore. In 
Figure 13.7 these subset. X-score means have been indicated by the open 
Circles, and the line about which the X-values tend to scatter is taken to 
e the one determined by these means. We shall refer to the straight line 
determined by the subset Y-means (squares in Figure 13.7) as the Y trend 
ine and to the straight. line determined by the subset. X-means (cireles in 
Ngure 13.7) as the X. trend line. If, for a given collection of pairs of im- 
Perfectly correlated scores, two such trend lines do, in fact, 
аге said to be linearly or rectilinearly correlated. Of course 
ship is perfect as well as linear, only one Y-seore value will be associated 
wath апу given X-seore value and all points will lie on a single straight line. 

Го summarize we shall state two conditions which 


exist, the scores 
| MF the relation- 


i must be satisfied 
ore the relationship between a given set of pairs of variables can be said 
to be perfectly linear. 


С Onditions of Perfect Linearity (Rectilinearity) 
1. The graphically plotted means of the Y-scores corre. 
X-score must lie on a straight line. 
2. The graphically plotted means of the X 
score must lie on a straight line, 


sponding to a given 
“SCOrES Corre. 


sponding to a given 


CORRELATION 


381 


There are two very important points to note with regard Ce 
conditions. In the first place it should be clear from Figure 13.7 Ies 
perfect linearity does not imply perfect correlation. Variables bs 2 
exhibit perfect linearity may exhibit any degree of correlation. Iu ш: 
second place perfect linearity is an ideal condition which is rarely, if pss s: 
satisfied in real collections of data. In studying relationships, asin studying 
averages or proportions, we are usually seeking population facts. = 
example, we are not ordinarily so much interested in the correlation bet pim 
height and weight for a particular group of twelve-year-old boys as in t 
correlation between height and weight for all (i.c., the entire papillon 
of) twelve-year-old boys. It may be that the condition of perfect Lesen, ^ 
does not hold in the case of these variables for this population, but even ни 
it did, it is not likely that perfect linearity would be found in a samp r 
taken from it. Even if a large number of pairs is selected, the number 0 
weight scores associated with a particular height score may be npe 
small, so that their mean will be relatively unstable—that is, subject ^" м 
rather large sampling error. Moreover, if the sampling is random, E 
errors are as likely to be in one direction as in the other, with the resu | 
that the means of the subsamples of weights associated with fixed heights 
may deviate rather markedly from a straight line. Actually, then, ш 
practical issue is not whether the condition of perfect linearity holds for à 
given collection of pairs, but rather whether the collection exhibits à "e 
cient tendency toward linearity to warrant the assumption that the ушу 
condition holds for the population represented. While a statistical test 0 
the significance of departure from linearity* is available, consideration of it 
is beyond the scope of this text. We shall simply resort to a visual [S 
spection of the scatter diagram or bivariate frequency. distribution. I 
the field of points appears to be roughly elliptical, it is not unreasonable 
to assume that the condition of linearity holds for the population repre- 
sented. If there is doubt it may be advisable to compute the means of the 
Y (columns in a bivariate table) and X (rows in a bivariate table) sub- 
samples and to locate them on the diagram or table. If linearity holds for 
the population, these Y- and X-means will each exhibit a straight line 
tendency. These lines will intersect, with the angle between them becoming 
smaller as the degree of correlation increases, and will merge into à single 
line as r approaches unity. Since the subsample means for a given collection 
of data are usually based on relatively small numbers of cases, they may 
fluctuate quite markedly from a true straight-line arrangement without 


vitiating the underlying assumption of linearity for the population. The 
eae 


*That is, a test of the hypothesis that in the population the means of the subsets E 
Y-scores (or X-scores) which are asso ed with the same X-score (or Y-score) fall 9! 
н ight re For a description of this statistical test, see E, F. Lindquist, Denm t 

Analysis of Erperiments in Psychology and Education (Boston: H hton Mifflin Со 
1953), pp. 343-344. (Boston: Houghto 


382 


CORRELATION 


critical point is that they appear to be arranged along a straight rather than 
a curved line. 

While it may be that heights and weights for a given sex and age 
population are not perfectly linearly related, scatter diagrams of height- 
weight data for samples from such populations are not indicative of any 
marked departure from lincarity.* The subsample weight means (@) and 
subsample height means (O) for a sample of 50 twelve-year-old boys are 
shown in the bivariate frequency distribution of Table 13.8. The subsample 
Weight means exhibit a very clear fit to a straight-line pattern. The fit of 
the subsample height means is far less precise but it appears, nevertheless, 


TABLE 13.12 Bivariate Frequency Distribution for a Random 
Sample of 500 from a Population Having a Corre- 
lation Coefficient of .7 and Satisfying Perfectly 
the Condition of Linearity 


X-Scores 
822323223591 
! $ 8 9 $9 8 à $ $ $ Hy yx 
3 
5 
T 
ES 
fy] 12 25 55 70 92 90 85 41 18 12 500 49.9 
Y|32.4 39.4 43.2 43.4 48.1 51.5 547 60.0 64.8 66.6 |49.1 
© Means of X's associated with given Y's 
Ф Means of Y's associated with given X's 
Se کڪ‎ 
БИ 
ot eirht-weight scatter diagrams are definitely curvilinear for samples from populations 
pri 20¥8 (or girls) when age is allowed to vary ., When the ages of the individuals i 
op RUE the population vary, say, from 4 to 17. For samples from populations consisti 
Individuals of the same age, this curvilinearity is not apparent. "В-НЫ 


CORRREL, 
RELATION 38 3 


that there is a tendency for these means also to fall along a straight line. 
The heights and weights of this particular collection would, therefore, be 
regarded as being linearly related, though, of course, not perfect ly so. 8 

Table 13.12 is a bivariate table for a random sample of 500 pairs 
selected from а population of pairs of X- and Y-values which is known to 
have a correlation coefficient of .7 and to satisfy perfectly the condition of 
linearity. The subsample Y-means (the solid black dots) and A -menns 
(the open circles with dots inside) have been located in this table with refer- 


TABLE 13.13 Hypothetical Bivariate Frequency Distribution of 
Age and Memory Scores on a Test on Motion 
Picture Film Plots 


Age Groups 


8-12 
13-17 
18-22 
23-27 
28-32 
33-37 
38-42 
43-47 
48-52 
53-57 


Memory Test Scores 


fa 10 10 10 10 10 10 10 10 10 10 | 100 
Mm 24.5 45.5 54.0 53.5 53.5 52.5 50.0 44.0 34.0 16.5 | 


384 


TION 
CORRELATIO 


ence to the score scales. It is clear that, even with a sample this large, con- 
siderable fluctuation from a true straight-line pattern remains. 
Table 13.13 shows the biv 


nate frequency distribution for a hypo- 
thetical set of 100 pairs of ages and memory scores on a test on motion 
picture film plots.* The means (@) of subsamples of memory scores have 
been located in this table with reference to the score scales. Clearly, 
neither the individual scores nor the means of subsamples of memory scores 
follow a straight line pattern. Instead of falling into an elliptieal pattern 
the individual pairs of scores appear to be scattered about in a curved field, 
and the subsample means tend to fall along a curved line. The correlation 
appears to be positive for ages 8 to 22, to be zero from ages 22 to 37, and 


Y-Scale 
Y-Scale 


X-Scale X-Scale 


Y-Scale 
Y-Scale 


X-Scale X-Scale 


Ticure 


3.8 Boundaries of hypothetical fields of seatter-diagram points 
and Y-trend curves for various types of curvilinear relationships 

these particular 100 scores were fabricated for the purpose of illustr 
less were made to conform to data reported by 11. 


"Alt hough 


Neverthe ation, they 


: - Jones, in Peye i 

s lies of Motion Pietures: П, Observation and Rec; a Function ot Aqtdlogical 
Hy of California Publications in Psychology, Vol. 3 (1928), pp. 225-43, ` е 

CORRELATION 


385 


to be negative for ages above 37. Relationships such as this, which do not 
exhibit linearity, are said to be curvilinear. 

Many variations of curvilinearity are possible. For a few examples, 
see Figure 13.8, which presents boundaries of hypothetical fields of scatter- 
diagram points. In each of these examples, the points are scattered about 
a curved line which is determined by the means of subgroups of 1 “scores 
(see Figure 13.7). Actually the shapes of these curved lines characterize 
the underlying nature of a particular type of curvilinear relationship, just 
as the straight line characterizes a linear relationship. That is, whereas 
there is only one type of linear relationship—a relationship is either linear 
or it is not—there are innumerable types of curvilinear relationships. 
Curvilinear relationships are much more difficult to describe than are 
linear relationships. Not only must the type of relationship—1.c., type of 
trend curve—be described but also the degree or closeness of relationship 
must be indicated. The situation is complicated by the fact that both type 
and degree may be one thing when the data are considered with reference 
to the Y-trend curve and quite another when the data are considered with 
reference to the X-trend curve. As in linear correlation the degree of 
relationship may vary considerably for the same curve type. In Figure 
13.8 A and D are identical in type, but the relationship is much closer in A, 
as is indicated by the lesser width of its point field. That is, the pair points 
in A do not deviate as widely from the trend curve as they do in D. Just 
as in linear correlation, the degree of relationship in curvilinear correlation 
is perfect if all individual pair points fall directly on the trend curve, and 
the greater the tendency for the points to deviate from this curve, the 
lower the degree of relationship. 


13.7 EFFECT or CURVILINEARITY Upon THE MEAN z-SCORE 
PRODUCT, r 
Assume a projectile to be fired at 
surface with a forward velocity of 1,600 f. 
ship between time (T) in flight in se 
feet is a perfect curvilinear rel 
and the constant of gravity t 


an angle of 30° from the earth's 
eet per second. Then the relat 1917 
conds and height, (I7) above ground in 
ationship which, if air resistance is neglected 
aken to be 32, is described by the formula 

Н = 8007 — 1672 


Table 13.14 gives corresponding pairs of T- and H-values. For ex 
ample, the table shows that after an elapsed time of 5 seconds the heigh! 
of the projectile is 3,600 feet. "The means and standard deviations of thes¢ 
time and height scores were obtained and used in converting them int? 
z-score units which are also given in Table 13.14. Figures 13.9 and 13.10 
show the scatter diagrams for the pairs of measures in original and z-score 
units respectively. It is obvious that the correlation is perfect with all 


386 


TION 
CORRELATIO 


TABLE 13.14 Heights of a Particular Projectile After a Given 
Lapse of Time 


T H 2т 21 Zuzr 
0 0 —1.6 =й + 2.72 
5 3,600 — 1.8 = Y + 0.91 

10 6,400 — 0.9 + 0.1 — 0.09 

15 8,400 — 0.6 + 0.7 — 0.42 

20 9,600 — 0.3 + 1.0 — 0.30 

25 10,000 0 +11 0 

30 9,600 + 0.3 + 1.0 + 0.30 

35 8,400 + 0.6 + 0.7 + 0.42 

40 6,400 + 0.9 +0.1 + 0.09 

45 3,600 + 1.3 —0.7 — 0.91 

50 0 + 1.6 —17 — 2.72 

0.00 | 
[e 


points falling precisely on the trend curve. The pairs of z-score values 
have also been entered in Figure 13.10 in parentheses adjacent to the point 
represented, the first value in each instance being the z-score for T and the 
Second the z-score for H. Note that for each pair of scores in Quadrant I 
of Figure 13.10 there is a pair in Quadrant II having the same corresponding 
Absolute values and that the same is true of the pairs of scores in Qua- 

tants III and IV. Now the signs of the scores of the Quadrant I pairs 
are alike (both positive) whereas the signs of the scores of the Quadrant 
U pairs differ. Consequently, the sum of the products of pairs of z-scores 
In Quadrant I has the same absolute value but differs in sign from that of 


Lal 
15 20 25 30 35 40 45 50 5 


T-Scale (seconds) 


0 5 10 


соды 13.9 Scatter diagram for pairs of original time and height meas- 
ures for a particular projectile in flight 


CORRELATION = 


(0, +1.1) 
E (4-0.3, +1.0) 


“7 
—2.0 i К 5 +10 +15 +20 
(—13, —0.7) (+1.3, —07) 
—10 
Ill IV 
-15 
(—16, —17) (+1.6, —17) 


Figure 13.10 Scatter diagram for pairs of projectile time and 
height measures in z-score units 


the pairs in Quadrant II and the net total z-score product for these two 
quadrants is, therefore, zero. The same is true of the corresponding ne! 
total for Quadrants III and IV. Hence, it follows that the value of the 
mean z-score product, r, is zero—a value which we have previously learned 
to interpret as indicative of a complete absence of relationship. 

It is obvious from the foregoing that the index r is not appropriate for 
describing degree of correlation between curvilinearly related variables, 
and that some index based on a more general definition of relationship will 
be required in such situations. While such indexes are available, ет 
consideration is beyond the scope of this text. The critically important 
point is that the student appreciate that the applicability of r is limitet 
to linear types of relationships. This implies that a required first step 1" 
any correlation analysis be the preparation of either a scatter diagram ОГ К 
bivariate table for the purpose of determining whether the pairs of values 
conform sufficiently to a linear (elliptical) pattern to justify the use of 7 
as a descriptive index. 


13.8 THE CALCULATION OF F^ FROM THE ORIGINAL SCORE VALUES 


To compute т by followin 


g the definition (13.1) is а formidable (ах 
Tt requires (1) finding the 


mean and standard deviation of each variat 


388 


лох 
CORRELATIO 


(2) converting all values of both variates to z-units by application of either 
(7.1) or (7.2), (3) obtaining the products of the pairs of z-values, and (4) 
determining the mean of these products. In this section we shall present a 
procedure designed to reduce appreciably the tedium of these computational 
steps. 

Let the original score values of the variates be represented by X and Y 
Then for pair i, 


and 


PeeY d 


Zy.— 


i Sy gy 


Now substituting into (13.1) we obtain 


or 
mcs PUR 
Nexêr 


(see Rule 3.1) (13.2) 


Now substituting (6.5) for ёх and 8y we obtain 


Eriti 


or 


(= ee (13.3 
NV (Sr?) (Sy?) ) 
Computation of the factors under the radical in the denominator is 

Casily effected from the X- and Y-values by application of (6.6). We shall 

derive an analagous formula for the expression in the numerator of (13.3). 


Rue 13.1. The sum of the products of the deviations from their re- 
Spective means of pairs of scores in a collection of pairs is given by the differ- 
Ence between the sum of the products of the pairs of scores and product of their 
Separate sums divided by their number. Or symbolically, 


-y CXXXY2 
Drai = EX Di к N (13.4) 


Before presenting the proof of this rule we shall verify it in the case of a 


Specific example, Consider the following 5 pairs of scores (in each pair 
, 


the X-score is given first). 

13,50; 9,90; 7, 130; 5, 10; and 1, 70. 
Here 
CORRELATION 


389 


DN Y; = (13)(50) + (9)(90) + (7) (130) + (5) (10) + (1)(70) = 2,190 
DX; =18+94+74+54+1=35 
ZY; = 50 + 90 + 130+ 10 + 70 = 350 
Now applying the rule, we have 
Feu, = 5000 — UD. ag 
To verify this result we need Y and Y. These are 7 and 70 respectively: 
Hence the pairs of score values expressed as deviations from their respective 
means are 
+ 6,— 20; + 2, + 20; 0, + 60; — 2, — 60; and — 6, 0. 
Zziy; = CF 6)(— 20) + (++ 2)( 20) + (0) (4- 60) + (— 2)(— 60) 
+ (— 6)(0) 
= + 40, as before. 


Proof. Given a collection of N pairs of scores Ху, Yi; Xe, Ys; °° E 
Ху, Yx. Let i represent any integer from 1 to N inclusive and let X 
represent the mean of the X's and Y the mean of the Y's. Then 

Ery: = E(X;— X)(Y; — Y) 
=5(ХҮ;,— X,Y — XY;4- XY) 
—£EX,Y;— YZX;— XZY; + NXY 
[see (3.19) (3.20), and (3.21)] 
Now substituting from (5.1) for. Xand Y, we have 


А y. 2» d р А Ү; 
®лтуг=®Х(Ү— exa !_ Ох) m EXE 
= X,Y, CXOCY) 
gis * N 


which is the rule we wished to establish. 

Hence, by using (13.3) together with (6.6) and (13.4), we have ® 
fairly convenient procedure for computing the value of r directly from ihe 
original X- and Y- values. By way of a simple example, we shall compute 
r for the five pairs of X- and Y- values above. As already shown, applica" 


tion of (13.4) to these pairs gives 
Lay; = 40 


Application of (6.6) to the X- and Y-values gives 


Zn? = 132-924-724 524 y2_ (B+ 9FT+ 5+ 1)? _ 
5 


80 


Жу, = 50? + 902+ 1302-4 102-4 792 — (50+. 90 + 130 + 10 + 70)? _ & 000 
5 


390 


CORRELATION 


Hence substitution into (13.3) gives 
40 40 
V(80)(8,000) 800 


This computational procedure applied to the 50 pairs of height and 
weight scores given in Table 13.2 is summarized below. 


05 


ZH = 2,934 


YH? = 172,662 EHW = 257,950 
(SH)?2/N = 172,167.12 EHYZXW)N 


Xhw- 


ХА = 494.88 


Formula (13.5), below, is the same as (13.3) except that it incorporates 
the instructions of (15.4) and (6.6) for obtaining the sum of products in 
the numerator and two sums of squares in the denominator of (13.3) and 
thus provides a formula explicitly in terms of raw- or original-score sums, 
Sums of squares, and sums of products. 


zxy,. ®Хд(УӘд 
sd N 


VEZ EST xps " „= 


(13.5) 


Application of (13.5) in the case of the last example gives: 


(2,934) (4,358) 
50 


172 662 — A 397,384 — сыне? 
Ез 5 * 50 
‹ E 255.79 


257,950 — 


V[494:88][17,530.72] = (154, as before. 


13.9 THE CALCULATION OF r FROM A BIVARIATE TABLE 
_ Table 13.15 is a reproduction of Table 13.7 except that in the upper 
right-hand corner of each cell there has been entered the product of the 
^- and Y-values for that cell. For example, the X- and Y-values for the 


Cell in the upper right-hand corner of the table are 5 and 7 respectively 
‘nd their product 35 appears in this cell. The fy¥ and f,y? subtotals 
Or each Y-score value appear in columns at the right of the bivariate 


table Proper. The sums of these subtotals are the Y- and Y?-sums for the 


CORRELATION i i 


ЖЕТ nds 
7 агі "re Distribution of 30 Pairs 
CABLE Bivariate Frequency p. 
= 1 3.1 5 of Scores Showing Computations Necessary for 
Determination of r 


X-Variate 


rows 


fx 30 164 916 535) 
fxx 94 n 
col XX 336 


GD اس‎ 


А ы А T" TOS arginal 
marginal Y-distribution. Similar subtotals and grand totals for the margin: 
X-distribution appear below the bivariate table proper. 


rows 3 

The extreme right-hand column headed 3 fu XY contains subtotal 

by rows of the X Y-products. These are most readily found by multiplyné 

the cell frequency by the cell product number and summing these products 
by rows. For example, in the second row from the top 


row 2 


2 fee XY = (1)(12) + (3)(18) + GEH + (2)(30) = 246. 


The sum of these row subtotals (535) is the sum of X Y-produets for Ше 
entire table. It is advisable for checking purposes also to obtain this same 
sum of products by determining column subtotals for the X Y-produet? 
and then summing these subtotals. These column subtotals are shown in the 
col 


able, that is, the row headed > feet gm 
second column from the left 


last row below the bivariate t 
For example,in the 


col 2 
> fa XY = (012) 4 (4)(1б) + (1)(8) = 60. 
This computational setup may be summarized symbolically as follows: 


392 


N 
CORRELATIO 


Let r= the number of rows and i represent any one row; 
c — the number of columns and j represent any one column; 
AN = the number of individuals (pairs of scores) and k represent 
any one individual. 


Then 
У == jg = N= 20 
i= * fel 
т x. 
کک‎ 09-14 
$21 à kzl 
e N 
У х, = У N= 
jat k=l 
r N 
>, Л = 2, = 916 
j^ N 
D SeX = Ў X*, = 336 
i= k=l 
r rows g col ) N 
2. ( X fu xy), => ( У fen XY m XV, = 535 
P j=l k= 1 
Now (13.3) may be applied with (13.4) and (6.6) as follows. From (13.4) 
30 э 
Y кый 535 — 3 xi = 535 — 513.867 = 21.133 
к=п 3 


and from (6.6) 


30 B 
È 22, = 336 — 927 = 336 — 294.533 = 41467 
k=1 Я 


30 2 
У у= 916 = cee = 916 — 896.533 = 19.467 
© 30 
Hence, (13.3) gives 
21.133 21.133 _ 
V(41.467)(19.467) 28.412 


744 


Formula (13.5) may now also be stated specifically in terms of the 


Notation employed with the bivariate table. 


CORRELATION 


393 


Application of (13.6) to the foregoing example gives 


woe 94X 164 
Ода = aA TT 
30 0 А 21,133 py 
вр] мот (19:467) 
336 — 916 — -——— 
3 


These procedures may also be employed. with data organized into à 
grouped bivariate frequency table, that іх, a table whose rows and columns 
correspond to intervals along the score scales. In this case, however, the 
individual original score values are lost through classification and must he 
treated as having the value of the midpoint of the interval corresponding 
to the row or column. Consequently, the relationships shown on page 
393 are only approximate and the application of (13.3) with (13.0 
and (6.6), or the application of (13.6), yields values of r which involve same 
“grouping error,” that is, which are only approximations of those derived 
directly from the original score values. However, if the grouping is not 100 
coarse these approximations are usually sufficiently aceurate for most 
practical purposes and the saving in computational labor through the 196 
of grouped data іх usually considerable— especially when calculating equip- 
ment is not available. Ordinarily, when a grouped bivariate frequency 
table is organized for computational purposes the row ( Y-seale) and column 
(X-scale) intervals should be established in accordance with the suggestions 
of Section 2.5 (see page 34 for a summary). Though the intervals 10 
Table 13.8 do not comply with these suggestions, nevertheless, for illus- 
trative purposes we shall use the grouped bivariate distribution of 50 
pairs of height and weight scores shown in this table. This table, together 
with the needed cell-product entries and computational columns, is Te 
produced as Table 13.16. Now using (13.6), we obtain 


299 aus 
256,195.50 — (2.9029.0) (1,335.0! 


— _ 50 
y [172,100.50 — 2829.0)" | 03,702.50 — 
50 | ° 
2,251.20 2,251.20 


V{519.18][17,858.00] 2.01102 79 
When this r was computed directly from the original height and 


weight scores, the value .754 was obtained. Consequently, the grouping 
error in this instance is —.015. That is | 


Grouping Error = 739 — A454 = — 015 


For most practical purposes this amount of error in 


к jeient 
А ligible а correlation сосет! 
ix negligible. 


394 


х 
CORRELATIO 


TABLE 13.16 Bivariate Frequency Distribution of 50 Pairs of 
Height and Weight Scores Showing Computations 
Necessary for Determination of r 


= 

= 

5 

Ф 

Heights (Interval Midpoints) ES "ms 

ою юю юа са s & 3 $4 
S 38 8 9$ $ $ = ж - = 


1345| | | 1 134.5 18090.25 8944.25 


7532.25 |T 781.25. 


124.5 | mp 3 373.5 4650075 2309475 
[6927.25 |7156.25| 

1145 гане 2 229.0 26220.50 14083.50 

104.5 "S PTT a 8 836.0 8736200 51205.00 


Weights (Interval Midpoints) 


a ia 
94.5 NU ЕЯ КЕ 4 378.0 35721.00 22302.00 
84.5 kiki EYES 10 845.0 71402.50 50108.50 
T г” 
^T Bw WEE ECC 13 968.5 72153.25 54571.25 
На 


[851525 [1633.25 3773.25, 


[3 3/2 8 516.0 33282.00 29025.00 


64.5 


54.5] ^ 2970.25 2861.25 


50 4335.0 393702.50 


fH 
2 
6 
13 
11 
7 
9 

2 


о o 4 4 4 4 9189 

оса a 

IJS A S 4 & ә оя 
= о м о ¥ 4 esso 
€ o о о o о | о 

159965 5 © 8/3 

- с © 

б a & FJ =» ils 

«о P = S GS ш Sia 
O RF б AO > 

= о о o o 0 g 

159 5 gage 2 

SAIN wo Му Чу = 

sik S8 8 8 ge © 
154 8B 88 8 

“|° N 3 8 * o = 

92 

o 


Although in the absence of computing equipment the use of a bivariate 
table does reduce, to some extent, the computational labor involved in the 
determination of r, the procedure as we have thus far described it is still a 
tedious one, It іх fortunately possible to introduce a variation which results 
In a further reduction in labor that is of considerable consequence. Before 
We can present this variation, it will he necessary to show that r is in- 
zn (not changed in value) under certain transformations of the score 
eales, 


CORRELATION =н 


Rue 13.2. Given N pairs of X- and Y-scores. Let a constant, A, = 
added to each X and a constant B added to each Y. Then the value of r for the 


new set of pairs thus formed is the same as that for the original set of pairs. 
Or symbolically, 


T(X AY 4B) — Гхү ы) 
Proof. Consider the deviation of any CX + A) value from the mean of 

all (X + A) values. 
(X-4)— Мх+л=(Х + 4)— (¥ +A) (see Rule 5.3) 


=X-X=y7 
Similarly, (Y + B) — M ү+в= у 


2[(Xi+ 4) — Mx, ЈУ, B) — My, 5] = Ушу: 
Also, by Rule 6.1a we know that 


#х+а= 8x and $y,5— 8, 


Now calculating T(x+ Ayr в) by (13.2) we obtain 


E[(X;4- A) 


N 
b» Tili 


1 
Муу 

= гуу 

which establishes the rule. 
Note that since А 


applies when const 


—MxsaJ(¥it+ B)— My, 
Мх ara 


T(x+ AY В) = 


and/or B may be negative 
ant amounts are 
às well as when such amounts 


as well as positive this rule 
subtracted from the Y- and Y-values 
are added to them. 


RULE 13.3. Given N pairs of X- and Y. 
by a constant, C, and each 
set of pairs thus formed is 
symbolically, 


“scores. Let each X be multiplied 
Y bya constant, D. Then the value of r for the new 
the same as that for the original set of pairs. Or 


"(cxi py) =rxy (13.8) 
Proof. Consider the deviation of any CX-value from the mean of all 
CX-values. 


CX — Mex 2 CX — GY (see Rule 5.4) 


=С(Х—Х) 
EN = Сї 
Similarly Dy — Mopy= Dy 
N 
e 2(CXi— Mex)\(DY,— M py) = У Ce, (Dy) 
T N 
—CD S ry, (sce Rule 3.1) 
1= 1 


396 


CORRELATION 


Also by Rule 6.2a we know that 


бех =C8x and ŝpr= Dêyr 


Now calculating ricxicor, by (13.2) we obtain 

E(CX;— Mex)(DY¥i— Mor) 
N8ex8py 

_ CDZzyi 

~ NC8xD8y 


T(cx)(DY) = 


which establishes the rule. 
Note that since C and/or D may be fractions (e.g., 1/Ё or 1/F) this 


rule applies when the X- and Y-values are divided by constant amounts 


as well as when they are multiplied by constant amounts. 
Considered jointly these rules show that the value of r remains in- 
variant under any linear transformation of the variables. That is, 
пху = FT(CX+ A)(DY+ 8) =" х-вуг-0) (13.9) 
Е H 
Now consider the height scale of Table 13.16. Let the constant 52.5 
be subtracted from each H-value (i.e., cach interval midpoint) and let the 
result be divided by 2, the interval size. Then the linearly transformed 
values (Lj) of the interval midpoints become 0, 1, 2, 3, 4, 5, 6, and 7. For 
example, consider the last (largest) midpoint, 66.5. Here 
66.5 — 52.5 _ 14 
= o ке 
Also let the constant 54.5 be subtracted from each W-value and let the 
result be divided by 10 (the interval size). Then the Lyy-values of the 
Interval midpoints become 0, 1, 2, 3, 4, 5. 6, 7, and 8. If we now compute 
^ using these Ly- and Ly-scale values the result will be identical with that 
Obtained by using the H- and W-seale values. Obviously we have gained 
the advantage of working in terms of small integral scale values. The 
Computational work is not only less tedious but less subject to error. 
Table 13.17 shows for the bivariate distribution «Eid: 13.16 the 
Computation of rwn as carried out in terms of these d ;w-scales. The 
Tesult is, of course, identical with that previously 0 nnt . any 
If the intervals corresponding to the columns sr rov es the bivariate 
table are of uniform size it is quite unnecessary to use the ormula 
X-E 


to obtai i nsformed scale values. All e need Чону 
ain the linearly tra lumn (or row) midpoints. It is not even 


enter the values 0, 1, 2, °°» 38 °° 


397 


CORRELATION 


vari "requency Distribution and Computation 

ABLE Bivariate Frequency isl i о 

: 13.17 of r for Height and Weight Scores of Table 13.16 
Linearly Transformed 


Linearly Transformed Height Scores (Ly) 


flrylw 


Linearly Transformed Weight Scores (Lw) 


fy} 2 6 13 m 7 9 2 
fiyly 0 6 
сы "uH 0 6 52 99 112 225 
=Feetstutw{ 9 9 56 105 116 225 91 
(152) (161) 
E ANE ИШЕ 
_ 15272 (161| (129.92) (178.58) 
ШЕ 5o. вт Es 
112.56 
715232 ^ 739 as before, E 


necessary to make the zero point in the transf. 
interval having the sm 
make the zero point in 
а central column (or roy 


ormed scale correspond to the 
allest midpoint. It is, in fact, a common practice to 
the transformed scale correspond to the midpoint of 
¥). When this is done the transformed values of the 


interval midpoints below the zero midpoint are — 1; —2,—3,3-4 while 
above the zero midpoint these values are + 1, + 2, + 3,۰-۰. Although this 
complicates the procedure 


to the extent of requiring that algebraic signs be 


398 


CORRELATION 


TABLE 1 3.1 8 Bivariate Frequency Distribution and Computation 
of r for Height and Weight Scores of Table 13.16 
Linearly Transformed with Central Zeros 


ES 

= 

Linearly Transformed Height Scores (Ly) = = 5 
ы MES E ee 

=з: -2 -1 0 +1 42 43 44 c2 2 а 8^ 


10 -10 +10 —4 


19. —26. +52 +28 


8 —24 +72 +27 


Linearly Transformed Weight Scores (Lw) 


+16 +12 


2 HRN 7 9 


fly 
filu | =6 —12-13 0 +7 +18 
сы fuu (Lay [+18 +24 +13 0- 47 F36 


+18 +30 +24 0 +1 +18 


= f cells LHLW 


(2)(—39) 
Wilk Sent 111+1.56 


й 027 039] V/(130—0.08) (209 — 30.42) 
130— 209—597 
50 
E n 256 ane 236 = 739 as before. 
277112992) 178.58) 15232 


taken into account in making the computations, it results in the trans- 
agnitude. This may represent a real 


formed values being still smaller in m 


advantage in situations in which the erva 
usually recommended for computational purposes. To illustrate the com- 


putation of r using a bivariate table and transformed scales with central 

E E х s s : NES А с 

zero points, we have again used the bivariate height-weight distribution of 
27 Ў Е rp $ 19 

Table 13.16. This computation 15 presented in Table 13.18. 


number of intervals is as large as is 


399 


CORRELATION 


13.10 INFLUENCE OF THE VARIABILITY OF THE MEASURES 
Upon THE MAGNITUDE OF r 


If, in a study of the relationship between measures of two traits, we 
selected two groups of individuals or objects such that one group showed 
greater variability in these measures than the other, we would find that the 
coefficient of correlation r between the measures would be greater for the 
more variable than for the more homogeneous group. This fact may easily 
be inferred from a comparison of scatter diagrams of pairs of scores for 
homogeneous and hetereogeneous groups. 

For example, suppose in the study of the correlation between dis- 
crimination decision time and physically assessed dissimilarity of objects 
to be discriminated (see Section 13.2, pages 368-370) that only highly 
similar objects were used. Let us assume that for no set of objects was the 
physically assessed difference or dissimilarity score greater than 47. For 
the data of Table 13.5, Figure 13.11 shows the time-dissimilarity scatter 


70 


65 


60 


T-Scale 


55 


50 


45 


на S9 35 40 45 50 
D-Scale 


Fievre 13.11 Scatter diagram 


for 17 pairs of time-dissimilari 

Фе” йк ән s ime—dissimilarity 

scores with no dissimilarity score exceeding 47 y 
[ : 9 4 


400 


CORRELATION 


diagram for the 17 sets of objects of Table 13.5 for which the dissimilarity 
scores do not exceed 47. Clearly, this elliptical field of points is much 
broader in relation to the length of its major axis than is that of Figure 13.5, 
in which the dissimilarity scores involved ranged from 31 to 68.* The 
correlation between the 40 pairs of time-dissimilarity scores of Table 13.5 
(Figure 13.5) is —.70, a rather substantial negative correlation. The corre- 
lution between the 17 pairs of scores pictured in the scatter di 
Figure 13.11, on the other hand, is only — .30. 

Ах a second example, consider the correlation betwe 
reading comprehension test (R) and heights in centimeters (H) for a group 
of fourth-grade pupils. Since it is known that. there is an almost complete 
lack of relationship between these variates for a group of such individuals, 
the boundary of the scatter diagram point field will be approximately 
circular, Now consider the same measurements for a group of third-grade 
children. An approximately equal lack of correlation will again be ob- 
served. However, while there will be some overlapping of the height and 
reading-score distributions for these two groups, we would expect that, on 
the average, the third-grade group would be lower both in height and read- 
ing comprehension than the fourth-grade group. Hence, the circle pre- 
scribing the boundary of the point field for the third-grade scatter diagram 
would lie below and to the left of the circle for the fourth-grade scatter 
diagram. Similarly, for a fifth-grade group we would expect the boundary 
circle again indicating an almost complete lack of relationship to lie 
above and to the right of the fourth-grade circle, and for a sixth-grade 
group we would expect the circle to lie above and to the right of that for 
the fifth grade. The placement of these various boundary 
in Figure 13.12. 

Now suppose we consider the boundary of the point field for the 
diagram of such reading comprehension and height scores for a mixed group 
of third-, fourth-, fifth-, and sixth-graders. Obviously, points from each of 
these circular fields would be included and the resulting point field would 
have the shape of a long narrow ellipse indicating a substantial degree of 
correlation. Thus we see that while the correlation be 
prehension and height may be virtually nil for the rel 
Sroups consisting of pupils at the same grade level, th 
these 


agram of 


Cn scores on a 


circles is shown 


scatter 


tween reading com- 
atively homogeneous 
¢ correlation between 
same variables may become substantial when determined 
heterogeneous group consisting of pupils at various grade levels. 
These examples show the marked effect upon the magnitude of r result- 
ing from either a curtailment or an increase in variability of the me 
involved. The magnitude of the coefficient of correlation be 
of two traits for a given set of individuals or objects de 


for a 


ахпгех 
tween Measures 
pends, then, upon 


*In compa ring Figures 13.11 and 13.5 the student should take 


é 1 t into account diffe 
in the physical distance representing the seale units, 


rences 


CORRELATION 


401 


H-Scale 


R-Scale 


FIGURE 13.12 Hypothetical circular boundaries of scaller- 
diagram point fields of reading comprehension (R) and height 
(H) scores for separate groups of third-, fourth-, fifth-, and 
sixth-grade pupils 


the variability of these measures for the given set, or, as the same idea is 
frequently expressed, it depends upon the "range of talent" of the set. 
Actually, the magnitude of the coefficient of correlation is, therefore, sub- 
ject to at least a degree of willful manipulation. It follows that it is not 
meaningful to speak of the correlation between any two traits or character- 
istics, apart from any description of the particular collection of individuals 
or objects involved. Statements such as, "the correlation between height 
and weight is .70,” or “there is only a low correlation between intelligence 
and spelling ability,” ате indicative of loose thinking. 
only be made with reference to a specific group and, hence, should always 
be accompanied by a description of the particular group involved, including 
a description of its variability in the measures concerned. Comparisons of 
degree of relationship should, therefore, not be based upon comparisons of 
r-values unless these values are established for groups that are at least 
approximately alike in “range of talent.” 


Such statements can 


13.11 REMARKS REGARDING THE MEANING OF A GIVEN VALUE OF r 
We have already noted that while the coeffici 


ent of correlation, r, is a 
convenient quantitative index of rel 


ationship, it may not be considered as 


402 


CORRELATION 


directly proportional to the degree of relationship (see Section 13.5, and 
also Figure 13.6). An r of .80, for example, may not be said to represent 
twice as close a relationship as one of .40, even though both are established 
for the same "range of talent." In order to make such a statement we 
would have to be able to describe, independently of r, precisely what we 
mean by closeness or degree of relationship, and no such description or 
definition that is generally acceptable has as yet been proposed. Lacking 
such a definition of "degree of relationship," we are unable to state in 
general how r changes in value for given changes in that degrec. 

It is important to recognize that r, after all, is only one of a number of 
possible arbitrary mathematical procedures which, when applied to a set 
of related measures, will yield a single numerical value somehow indicative 
of the degree of relationship. The coefficient r is based on z-score products. 
Other indexes, for example, could be derived from z-score differences for 
the pairs concerned, or from their z-score ratios, or from the squared differ- 
ences between pairs of z-scores, or from similar measures based on percentile 
ranks instead of z-scores, and so on. For the most part these possibilities 
do not possess the characteristies that make them as convenient to use and 
interpret as r,* but which of them is most nearly directly proportional to 
the "degree of relationship" we cannot say, since this would depend upon 
how we defined degree of relationship. For precisely the same reason we 
cannot say in general that r is any better than other possible indexes zn this 
particular respect. 

Various schemes and devices have, nevertheless, been suggested to 
assist the student of statistics to appreciate the significance of a given value 
of r. Some of these are quite helpful in certain restricted types of situations, 
but may be seriously misleading in other situations or in general, and hence 
must be used with extreme caution. 

One of the most common and most misleading of these practices has 
been that of classifying certain r-values as "high," "medium," or "low." 
For example, an r of .30 or less has been said to be "low," one between .30 
апа .70 "medium," one from .70 to .90 "high," and one above .90 "very 
high." The numerical values of r corresponding to each of these verbal 
categories has, of course, differed for various classifiers. The point is that 
such classifications are invariably misleading, since what constitutes a 
“high” or a “low” correlation is a relative matter, and differs markedly for 
different types of variates. Coefficients of correlation as high as .5 between 
measures of a physical and a mental trait are extremely rare, and a cor- 
relation of .6 between two such traits would be considered phenomenal. On 
the other hand, correlations of this magnitude between reliable measures 


of two mental traits are quite common, and, henee, would be considered as 
کو‎ wales а 

з will be shown in the following chapter, the index r does arise in a mathematical solu- 
tion to a somewhat different problem and, hence, possesses mathematic n 
which make it preferable to the other possibilities suggested. 


al properties 


CORRELATION 403 


only "medium" for most groups in which we are interested. Again, a pe 
relation of .9 between two independent measures of the same trait for 
example, between the scores on two equivalent tests of spelling e 
might be considered as only "medium" or even “low,” particularly if the 
tests were long and comprehensive. In this latter situation, an r of 26 would 
certainly be regarded as extremely low. There is no single classification, 
then, that is applicable in all situations, and because of the danger that 
they will be applied in situations in which they are not. valid, it is best that 
any and all such classifications be disregarded entirely by the beginning 
student. 

The fact remains, nevertheless, that the adjectives "high," "low," and 
"medium" are convenient to use with reference to correlation coefficients 
and degrees of correlation. We have, in fact, used them in this chapter and 
shall continue to do so. This may appear inconsistent with what has just 
been said. We shall try for the most part, however, to use these adjectives 
to refer only to the absolute mathematical magnitude of r. That is, the 
adjective “high” as we shall apply it with regard to a correl 
refers to a value of r high up along the scale of possible values (near 1.00), 
the adjective "low" to a value of r near Zero, and the adjective "medium" 
to a value of r near .50. Used in this sense, "high" does not imply "im- 
portant" or “consequential,” nor does "low" mean of “по importance" or 
"no consequence." It is important to distinguish between such use of the 
adjectives "low" and "high" and their use as names of categories in some 
classification scheme for interpreting or evaluating r as an index of degree 
of relationship. 

In summary, then, it is recommended that the beginning student make 
no attempt to arrive at any absolute interpretation of r. He 
upon it simply as an index-value which is indicative 
related to, the degree of relationshi 
magnitudes, he should avoid tryin 
lationship is in one case than in an 
the knowledge that there is a di 
He should be careful, also, 
except when the relationshi 
involved comparable in ' 


ation coefficient 


should look 
of, but not linearly 
p. When comparing r-values of different. 
g to estimate "how much” closer the re- 
other, but should be content instead with 


fference of some indeterminate amount. 
never to compare r- 


and the groups 
ishes to secure a more 
nitude really means, he can do 
ш or the distribution of tally 
m which it was computed. 


13.12 CAUSAL Versus CASUAL OR CONCOMITANT RELATIONSHIP 


One other very important admonition rem 
serious blunder in the inter 
committed than that of 


ains to be made. 
pretation of corre] 


assuming that the corre 


No more 
ation. coefficients can be 


lation between two traits 


404 


CORRELATION 


is a measure of the extent to which an individual's status in one trait is 
caused by his status in the other. It is indefensible, for example, to argue 
that, because a high correlation exists between measures of reading com- 
prehension and arithmetic problem-solving ability for the individuals in a 
given group, problem-solving ability is therefore dependent upon reading 
comprehension or vice versa, that is, that a given student does well in reading 
because he is a good problem-solver. All of this may be true, but it does not 
follow from the statistical evidence of correlation. 

The observed correlation between measures of two traits is sometimes 
due to a cause-and-effect relationship between them, but there is nothing 
in the statistical evidence to indicate which, if either, is the cause and which 
the effect. For example, there is a fairly high correlation between age and 
grade status of elementary school children. In this case we know, of course, 
that we cannot increase a pupil's age simply by promoting him from one 
grade to the next—that age is not due to or caused by grade status—but 
we know this because of logical considerations which are quite independent 
of the statistical correlation. 

Again, correlations are sometimes observed between traits that have no 
cause-and-effect connection whatever, the observed correlation being*due 
entirely to a third factor (or to several factors) which is (or are) related to 
cach of the traits in question. For example, the correlation between reading 
comprehension and height scores for a mixed group of third-, fourth-, fifth-, 
and sixth-grade children (see Figure 13.12 and Section 13.10) results from 
the effect of age and training as reflected by grade status. Since reading 
comprehension is related to training and to some extent age, and since 
height is related to age, and since age and training as reflected by grade 
status are in turn related, it necessarily follows that a relationship between 
reading comprehension and height will be present in any group whose 
members differ in grade status. Obviously, this is not to say that an indi- 
Vidual is good in reading comprehension because he is tall in stature, 

Or consider the positive correlation in the general population between 
ages of mothers at parturition and the intelligence of their offspring. This 
Phenomenon is due to the fact that women of high intellectual standards 
and ability tend, for economic and cultural reasons, to be married later in 
life, and not because middle age is the best time to bear intelligent children. 
Again in both these examples, however, we reached our interpretations or 
Conclusions on the basis of logical considerations which were quite inde- 
Pendent of the direction or magnitude of any observed correlation. 

Finally, the observed correlation between two traits may sometimes be 
in just the opposite direction from a cause-and-effect relationship which 
really exists. For example, in almost any high school or college course there 
Is a negative correlation (of usually about — .30) between quality of grades 
earned and number of hours spent in study. The students who make the 
highest grades tend to be those who spend the least time in studying, while 


CORRELATION 405 


those who make low grades tend to spend more than the average amount. 
of time in study. It would obviously be absurd, however, to contend on 
the basis of such evidence that anyone can make higher grades by studying 
less. The negative correlation is largely due to the fact that intelligence is 
positively related to quality of grades and negatively related to time spent 
in studying—that the less able students must study more even to approach, 
let alone equal, the achievement of their more able classmates, The causal 
connection between quality of grades earned and time spent in study is 
positive, even though the observed correlation is negative. 

Whenever a substantial correlation is observed between two sets of 
measures, there are always the possibilities: (а) that there is no cause 
and-effect connection; (b) that a cause-and-effect connection is present. 
in the same direction as the observed correlation; and (c) that there is a 
cause-and-effect connection, but in the opposite direction from the 
correlation. Which of these possibilities exists, and what is the 
the cause-and-effect connection Gf any), cannot be determined from the 
observed correlation. Any interpretations concerning cause-and-effect: must 
be based on logical considerations and not on the observed correlation. The 
observed correlation may suggest a cause-and-effect relationship, but can 
never prove that it exists, or show in what degree it exists. 


observed 
strength of 


406 


CORRELATON 


14 


THE PREDICTION PROBLEM 


14.1 STATEMENT OF THE PROBLEM 


Suppose that we have for cach of a number (№) of individuals or objects 
Measures of two characteristics which are not perfectly correlated. For 
individual 7 we shall represent these two scores as X;and Y; Now suppose 
that for some individual(s), not included among the N, we have available 
only the X-score(s). The problem is to utilize the information or experience 
embodied in these N pairs of scores to make an estimate of the Y-score(s) 
for this (these) latter individual(s). 

For example, suppose that for a large number of individuals we have 
some measure (X), such as grade-point average, of high school achieve- 
ment and a similar measure (Y) of achievement in college. Now suppose 
We are confronted with the problem of advising some recent high school 
graduate who is considering attending college. Our information regarding 
this individual is assumed to be largely limited to knowledge of his high 
school achievement. Our problem is to employ our knowledge of, or past 
experience regarding, the relationship between high school and college 
achievement to estimate for this individual what his college achievement 
record (Y) would be were he to attend college, given only information 
regarding his high school achievement. (X). 

Other situations involving the same problem are numerous. The 
Wechsler Intelligence Scale for Children (WISC) must be individually ad- 
ministered by a specially trained expert who would have difficulty in 


THE PREDICTION PROBLEM 407 


averaging more than four such testings per school day. Tig ipe 
Nelson Tests of Mental Ability may, on the other hand, be given d ane 
groups of children in about 30 minutes. Here the problem is to мав li : 
experience with children who have taken both these tests to Ma d 
particular child's WISC score given his Henmon-Nelson score. Or = 
problem may be to estimate on the basis of past experience with the pe E 
formance of a large number of individuals on some test (sometimes severa 
tests are used) and their subsequent. suceess on some job or task (e.g. 
selling a certain product, learning to fly a plane, practicing medicine, sur- 
viving a certain surgical operation), the success on this job or task of some 
individual, given only a record of his performance on the test. 

Because in so many of the situations in which this problem arixes the 
required estimates pertain to some 
to these estimates as predictions 
tion problem. In the following 
solution for this problem 
predictions it provides. 


future status, it is customary to refer 
and to the general problem as the predic- 
sections of this chapter we shall present a 
and give some attention to the accuracy of the 


112 A PoSSIBLE SOLUTION To THE P 

AND Its WEAKN 

Suppose we let the individual whose Y-score we wish to predict be 

designated as d and let his known or given X-score be designated Vy. Now 

we wish to apply our past experience with individuals whose X- and Y- 
Scores are both known to us to estimat 


€ or predict d's Y-score, One rather 
obvious approach would involve sorting out from among all the individuals 
with whom we have had past e 
as d's, that is, of magnitude X 


xperience, those whose X 
a. The individuals c 

selected subgroup are all like d 
X-test or X-characteristic, We 
group. We would not, of course, expect every member ¢ 
make precisely the same Y-score, since we h 
up the problem situation, that the rel 
fect. There will be more or less variat 
group depending upon whether the 
high. 

Now we shall view 


REDICTION PROBLEM 
SES 


-scores are the same 
onstituting this specially 
in terms of performance or status on the 
shall now study the Y- 


"ores for this sub- 
of this subgroup to 
ave not, required, in setting 
ationship between Y and Y be per- 
ion among the Y-scores for this sub- 
correlation between X and У is low or 
this subgroup of Y-scores m 
whose X-seores all have the value X, as though it we 
from a subpopulation of such individuals (i.e., individuals Whose scores 
on the X-trait are all of magnitude X), We Shall also regard the individual 


d, who is a member of this subpopulation, as having been selected at ran- 
dom from it. Since d's Y-seore is unknown, we do not know just where 
along the seale of values of the various Y-scores of this subpopulation the 


ade by individuals 
re a random sample 


408 


THE PREDICTION pROBLEM 


particular Y-score for d falls. As a guess (estimate or prediction), however, 
we shall use an estimate of the mean of this subpopulation, since the 
expected value of a score selected at random from a population is the mean 
of the population (see Rule 5.7). This, of course, is to say that we shall 
simply uxe the mean of the subsample. 

For example, suppose that we are informed that the height of a ran- 
domly selected twelve-year-old boy is 63 inches and are asked to estimate 
or "predict" his weight. Suppose further, that our past experience with 
the heights and weights of twelve-year-old boys is as shown in Table 13.2. 
From the 50 pairs of height and weight scores given in this table, we select 
those pairs in which the height score is 63, that is, the same as that of the 
particular twelve-year-old boy whose weight is to be estimated. There are 
three such pairs of scores, viz., (63, 105), (63, 101), and (63, 81). The weight 
scores of these three pairs are now regarded as a random sample from the 
subpopulation of weight scores for twelve-year-old boys who are 63 inches 
in height. We shall use the mean of this sample, that is, (105 + 101 + 81) 3 
= 05.7 as an estimate of the mean of this subpopulation of weights, and this 
population estimate in turn as the estimated or predicted weight of the 


particular boy in question. 

Now, there are certain rather obvious weaknesses in this approach. 
We shall simply mention two that are particularly critical. In the first 
Place, even if our over-all experience with the two characteristics involved 
related to a large number of individuals, it is not likely that our experi- 
ence with individuals whose Y-scores are of magnitude X, will be very 
extensive. That is, among all the individuals for whom we have informa- 
tion about X and Y performance or status, there may be only a few whose 
X-scores are the same as that of d. It follows that our estimate of the mean 
of the particular subpopulation of Y-scores involved is likely to be based 
оп a rather small sample and, hence, is likely to involve a large sampling 
error. In the height-weight problem, for example, our estimate of the mean 
of the particular subpopulation of weights involved had to be based on ü 
Sample of only three cases. 

In the second place, the approach suggested is highly inefficient in the 
Sense that it makes so little use of the sum total of the available experience 
with the characteristics involved. Attention is given only to the single 
subgroup of Y-seores each of ee is paired a Aq. The subgroups of 

“cores paired with Ya + 1, Nat2, or Хи 1, ete., are completely 
ignored. It is quite probable ai these subgroups may contain informat ion 
regarding trends in the general level of the Y-sceores associated with different 
X-seores that would, if taken into account, make possible more accurate 
estimation or prediction. 

For these reasons we shall abandon the solution here suggested in favor 
of one less subject to the weaknesses just cited. 


THE PREDICTION PROBLEM 409 


14.3 A PREFERABLE SOLUTION TO THE PREDICTION PROBLEM 
IN A SPECIAL Case: LINEAR PREDICTION 


Suppose that the characteristics or traits with which we are concerned 
are linearly related for the population involved. This means, theoret ically, 
that for the entire population the means of the subpopulations consisting 
of Y-scores which are paired with the same Y-score lie on a straight line 
(see Section 13.6). In this special case (i.e., the case of linear correlation) 
we can improve our method of estimating the mean of any such subpopula- 
tion over that previously suggested by using all the data to determine the 
line which best fits the subsample means. The ordinates (Y-values) of the 
points on this line corresponding to the different X-values may then be used 
as estimates of the means of the subpopulations of Y-scores which are 
associated with given X-score values. 

By way of illustrating this scheme of attacking the prediction problem, 
assume the 20 pairs of X- and Y-scores given in Table 


TABLE 14.1 


14.1 to be a random 


A Random Sample of 20 
Pairs of X- and Y-Values 
Selected from a Popula- 
tion for Which X and Y 
Are Known To Be Line- 
arly Related 


sample from a population of linearly correlated pairs. Table 14.2 gives the 
means of the subsamples of Y-values associated with like X-values, For 
example, three Y-values, 10, 8, and 6, having the mean 8, 
with an X-value of 9. Table 14.2 gives the means of this and the other 
similar subsamples of Y-values. The scatter diagram for the 20 pairs of 
Table 14.1 is shown in Figure 14.1. The open circles in this figure locate 


are associated 


ae SUBSAMPLE 
TABLE 14,2 йн M n 


10 
Means of Subsamples of the Y- 
Values of Table 14.1 Which 


Are 
Associated with Same X-Value 


410 


THE PREDICTION PROBLEM 


the means of the subsamples of Y-scores associated with like N-scores. 
Clearly, these subsample Y-means do not fall on a straight line. However, 
they do exhibit a tendency to do so. Since the particular pairs of scores 
involved were, in fact, selected at random from a population of pairs for 
which the .X and Y relationship is known to be linear, it follows that the 
deviations of these Y subsample means from a straight-line pattern must 


11 


10 


Y-Scale 


0 1 2 3 4 5 6 7 8 9 10 1 


X-Scale 


Figure 14.1 Scatter diagram for pairs of X- and Y-values of Table 14.1 


be due to sampling error. In fact, we would expect the sampling errors 
involved in these means to be considerable inasmuch as the subsamples 
on which they are based are extremely small (one to three cases). A st raight 
line fitted to these subsample means provides us with estimates of the 
subpopulation means which are more precise than those provided by the 
individual subsample means because the placement of the line is based 
on the joint or simultaneous consideration of all the subsample means, and 
therefore, not only takes into account more of the information available in 
the data but also has the effect of "smoothing out" the sampling errors in 
these subsample means. 


THE PREDICTION PROBLEM 41 1 


In Figure 14.1 the line AB was fitted to the subsample means by the 
simple device of sliding a transparent straight edge into that position which 
would appear to the eye to represent the line of "best fit” to these subsample 
means. A square has been placed at cach point on this line which сог- 
responds to a particular integral value of X. The Y-values corresponding 
to these squares become our estimates of the subpopulation means. We 
read from the figure, for example, that for the subpopulation of individuals 
whose X-scores are 4, the mean of their Y-scores, thus estimated, is 3.6. 

Although the procedure just described provides estimates of the sub- 
population means which are superior to those provided by the individual 
subsample means, it still involves several weaknesses. We shall cite two. 
In the first place, the estimates it provides are not unique. This obviously 
results from the fact that in thus visually fitting a line to a given set of sub- 
sample means, different individuals would be most likely to select some- 
what different placements of the line and consequently would obtain 
different estimates of the subpopulation means, in spite of the fact that the 
same data were involved in each instance. Secondly, the procedure gives 
equal importance or weight to cach subsample mean in spite of the fact that 
some are based on larger subsamples than others. In placing line AB in 
Figure 14.1, for example, as much attention was given the Y-mean of the 
subsample for X — 10 as was given the Y-mean of the subsample for X — 9 


in spite of the fact that the former is based on only a single case while the 
latter is based on three cases. 


Obviously, the accuracy of the fitting pro- 
cedure could be 


greatly improved if some way could be found to provide for 
weighting each subsample mean in accordance with the size of the sub- 
sample involved. Actually this is not difficult to accomplish. All we need 
do is fit the line to the individual Y-values (i.e., to the 
diagram) rather than to the subsample Y-means. 

giving attention to many more points, a procedure 
the difficulty of establishing an optimum place 
therefore, increases the likelihood that different individuals working with 
the same data will establish different lines. If, then, the lines are to be 
fitted with reference to individual values, some r 
than the crude visual one 


dots of the scatter 
Of course, this requires 
which clearly increases 
ment visually, and which, 


method that is more precise 


which has been suggested must be found. A 
method for uniquely determining such a "best" 


vided by the mathematicians. A description of tk 
following section. 


fitting line has been pro- 
lis method is given in the 


14.4 FITTING A PREDICTION LINE BY THE METHOD OF LEAST SQUARES 


The general formula or equation for an 


y straight line located with 
reference to a set of rectangul 


ar coordinate axes is 
Y=6bX+¢ (14.1) 


412 


THE PREDICTION PROBLEM 


Points plotted to correspond to pairs of X- and Y-values which satisfy 
this equation all fall on a straight line. The placement of a particular line 
depends upon the values assigned the constants b and c in the particular 
instance. The value assigned c obviously indicates the point at which the 
line intercepts the Y-axis, since Y = c when X = 0. The value assigned b 
indicates the slope of the line, that is, the vertical distance the line rises 
(b positive) or falls (b negative) per unit of horizontal distance. These two 
pieces of information—slope and Y-intercept—are all that is necessary to 
locate or place a particular line with reference to a set of rectangular co- 


ordinate axes. 
For example, consider the particular line 
Y=5X -2 
Here b, the slope, ix .5 and е, the Y-intercept, is— 2. The location of this 
line with reference to a set of rectangular coordinate axes is shown in 
Figure 14.2 as line AB. The line 
Y=—2X+3 
which has a slope of — 2 and a Y-intercept at + 3 is also shown in Figure 
14.2 as line CD. 
Now for a given set of imperfectly correlated X- and Y-pairs, the 
problem of obtaining the "best-fitting" Y-prediction line can be reduced 


Y-axis 


Y-intercept = +3 ^ 


X-axis 


Кайны 14.2 Examples of lines placed on basis of slope and 
Y-intercept information 


THE PREDICTION PROBLEM 41 з 


to the determination of appropriate values for b (the у у 
l-intercept). Before a procedure for determining potest 2 0 
established, however, it is first necessary to specify precisely К с а е 
by "best-fitting." Various definitions are possible. We shall cons 
шы = 14.3 shows a scatter diagram for five imaginary pairs үл 
Y-values. Two lines АВ and CD have been drawn in this figure. Neithe 


Y-Scale 


X-Scale 


FrcenE 14.3 Scatter 


diagram of five hypothetical X- and 
Y-pairs with two lines 


of differing closeness of fit 
of these lines provides anything even 
of this scatter diagram. Of these two ] 
the better. If we measure the үе 


approaching a close fit to the points 
ines, however, the fit of AB is clearly 

rtical (i.e., vertical with reference to the 
X-axis) distances of the points from 4B (see broken lines from points to 
AB in Figure 14.3) it is clear that in the aggregate they are less than the 
distances of the points from CD (see solid lines from points to CD in Figure 
14.3). Clearly, it would be no problem to draw some third line in Figure 


14.3 which would provide a far better fit to the points than AB. If such a 
line were drawn, the distances of the 


: s points from it would total less than those 
of the points from AB. This suggests the use of the aggregate distance of 


414 


THE PREDICTION PROBLEM 


the points from a line as an index of the “goodness-of-fit” of the line to the 
points. The smaller this total distance, the better the fit. Unfortunately, 
however, the use of the absolute values of these distances results in an index 
which is awkward to handle mathematically. The situation is much the 
same as that which led us to adopt the variance in preference to the mean 
deviation as an index of variability (see Section 6.3). Here we shall discard 
the total of the absolute deviations of the points from the line as an index 
of “goodness-of-fit” and use instead the total of the squares of these alge- 
braic deviations. We are now in a position to set up a precise definition of 
the phrase "best-fitting” as it applies to a straight line placed in the point 
ficld of a scatter diagram. We shall simply define the "best-fitting" line as 
the one for which the value of our “goodness-of-fit” index is least. This 
definition states what is generally known in statistics as the least-squares 
criterion of fit. We shall now consider a somewhat more formal statement 
of this criterion as it specifically applies to the prediction problem. 
Let the prediction equation be represented by 


Y-bX-cc (14.2)* 


Then for a given pair of values, X; and Y; the vertical distance of the 


corresponding point from this line is given by 
y; Y;2Y;— Ni +0 


QGY;- bXrc) 


Figure 14.4. Distance of point X;, Y; from the line as 


measured along a perpendicular to the X-axis 


A f. АЕО ee 


"The caret above the Y is to remind the reader that the value of bX + c, while in units 
of the Y-scale, is actually an estimate of a subpopulation mean. The caret was used in- 
Stead of the bar since the latter is reserved to indicate the actual obtained mean of some 


Specific set of scores. 


THE PREDICTION PROBLEM 415 


where J’; is the point on the line corresponding to А; (see Figure 14.4). We 
shall refer to this distance simply as the deviation of the point from the 
line. Now suppose we have N pairs of X- and Y-values. Then our index 
(G) of the goodness-of-fit” of the line to these particular points is by defini- 
р б= У(Ү;,— Y) 2 E(Y, — [bX; + c]? (1.3) 
and the "best-fitting" line according to the least-squares criterion is that 
line for which the value of G is least. 

Mathematical statisticians have proved that the values of b and c 
which result in a minimum value of G for a particular set of № points are 
those given by the following formulas: 


—X; у= Y;—Y (14.4) 


(14.5) 


In these formulas Y and X represent, respectively, the means of all 
Y-scores and all. X-scores of which there аге V each. To use (14.5) it is, of 
course, first necessary to obtain the value of b by (14.4). We shall illustrate 
the application of these formulas in the next section. 


14.5 THE PROBLEM or THE Нісн SCHOOL COUN 


ELOR 

In his capacity as a high school counselor, Mr. Jones js frequently 
called upon to advise certain graduating students on their potential for 
success in college. In the past, Jones has given a certain college aptitude 
test to graduating students planning to attend college and then, after the 
lapse of a year, has obtained from each college involved а report on their 
success in college in the form of their freshman-year grade-point averages. 
He uses this experience, together with other information about the student, 
as а basis for predicting college success, We shall illustrate here how Mr. 
Jones, using only his past experience with this aptitude test and freshman- 
year college grade-point average, might predict the freshman-year college 
grade-point average for one of his current advisees. 

First, of course, Mr. Jones will utilize his past experience with the two 
variables involved to derive a prediction equation or formula of the type 
described in the preceding section. Table 14.3 shows the record of this past 
experience.* Below this table are the computations leading to the determin- 
ation of X, Y, and Ez?, and Zvy. These are the values needed to determine 
b and c by means of (14.4) and (14.5). The computation of Zy? is also 
shown since we shall have need for this value later. Though the work is 
тезе ici as 


*No sensil ounselor would, under ordinary circumstances, be satisfied with the limited 
amount (№ = 50) of experience recorded in Table 14.3. We have greatly reduced the 
number of cases that would usually be used simply for convenience of illustration 


416 


THE PR EDICTION PROBLEM 


TABLE 14.3 Scores on a Scholastic. Aptitude Test (X) and 
Freshman Year Grade-Point Averages (Y) for 50 
Randomly Sclected College Students 


x ) X ) X ) X ) А ) 
14 4.0 11 10 29 10 1.4 Б 2.6 
4 34 11 l0 $8 gq 26 8 A 
14 3.2 11 10 27 y 57 i og 
lm 57 11 10 2.6 y AH 8 L8 
S. e 11 10 22 9 24 8 14 
2 2 11 24 10 2.1 9 21 8 TU] 
12 24 11 2.2 10 19 9 "ng 8 09 
2 2.2 11 2.0 10 1.8 9 18 8 0.8 
12 21 11 19 10 1.7 9 1.1 7 17 
1 3.5 10 3:2 10 1.6 9 0.9 7 0.8 
3X = 505 
ї=101 
251 
100.5 (XY): 
150.5 


arranged in columns instead of on a line, the student will recognize that 
Es and Ey? were computed by application of (6.6) and that Zey was 
computed by (13.4). | 

Now using (144) and (14.5), Mr. Jones finds 


: = .284, and 
0.5 
2,248 — (.284)(10.1) =— .620 


Substituting these results into (14.2), Mr. Jones obtains the following 


Prediction equation: 

y 2.284 X — .620 
ister the college aptitude test to the advisee 
involved to obtain his V-score. Assume this score turns out to be 13. Mr. 
Jones then substitutes 13 for X in the prediction equation to determine an 
estimate of the particular advisee’s expected freshman-year college grade- 


Point average. 


Next Mr. Jones will admin 


ў = (.284)(13) — -620 = 3.07 ~ 3.1 


is well above the mean (2.248 = 2.25) for the 
ises the student that his chances of a success- 
He will, of course, encourage 


Since the expected value 


A ire group, the counselor adv 1 

ul coll ar to be very good. 
ege career appear to : 

s ses Tort to attend college. 


Such an advisee to make every ef 


417 


THE PREDICTION PROBLEM 


It will be observed that it is quite unnecessary for Mr. Jones to actually 
locate the line corresponding to the prediction equation with reference to a 
set of coordinate axes in order to effect the predictions. In fact, the only 
reason for ever plotting the scatter diagram at all would be to ascertain 
whether or not the assumption of linearity is justifiable. This, of course, is 
reason enough. The scatter diagram for the 50 pairs of values given in 
Table 14.3 is shown in Figure 14.5. It is clear that for these data an assump- 
tion of linearity is quite justifiable. To illustrate further the theory in- 
volved, the means of the subgroups of Y-values corresponding to the 
different X-values are shown as open squares (0) in Figure 14.5. The pre- 


+40- » 


1 


Я ZJ & 9 10 1 42 i$ uw ts 
—0.5 1) rise —.284 X-Scale (Aptitude Test Score) 


Figure 14.5 Scatter diagram, subsample means ( ), and prediction line 
for 50 pairs of aptitude-test scores and college freshman grade-point averages 
of Table 14.3 


diction line is also shown in this figure. Of course, for purposes of the prac- 
tical application of this prediction process, it is not necessary to show on 
the scatter diagram either these subgroup Y-means or the prediction line. 
The importance of plotting the scatter diagram, however, cannot be over- 
emphasized, since it affords one very good check on the appropriateness of 
the straight-line solution to the particular prediction problem.* 


“Other methods of testing departure from linearity are available 


; but are beyo: ` cope 
of this text. See footnote, page 382. yond the sco] 


418 


THE PREDICTION PROBLEM 


It is also important that the student fully appreciate the precise nature 
of an estimate yielded by this solution to the prediction problem. It repre- 
sents an estimate of the mean of the Y-scores made by a subpopulation of 
individuals all of whom make the same X-score. Even if the obtained 
estimate is an accurate one (i.e., of approximately the same magnitude as 
the subpopulation mean), it still may or may not be a good estimate of a 
particular individual's Y-score, depending upon whether or not this indi- 
vidual's Y-score is located near the subpopulation mean. Moreover, the 
particular method of estimation involved is based on the assumption that 
the particular subpopulation mean is one of a family of such means all of 
which fall on the same straight line. The estimation of the particular sub- 
population mean is actually effected by first estimating the position of this 
line. This makes it possible to take into account past experience with 
individuals who do not belong to the particular subpopulation in question 
at the moment. If Mr. Jones, for example, had had to limit his estimate 
of the particular advisee's college success to his past experience with indi- 
viduals belonging only to the particular subpopulation involved, he would 
have had only two cases with which to work, since only two of the 50 in his 
experience pool had X-scores of 13. Instead, he was able to employ his 
entire pool of past experience to estimate the placement of the line which, 
in turn, provided the value of the particular subpopulation estimate re- 
quired. A full understanding of the fundamental nature of this solution to 
the prediction problem is essential to an intelligent application of it. Par- 
ticularly, it should serve to impress the student with how crucial the validity 
of the assumption of linearity is to the success of the process." 


14.6 OTHER Forms or THE PREDICTION Equation 
If we substitute the expression for ¢ given in (14.5) into (14.1) we obtain 
f2bX--Y-—bX, or 
ў=(Х—Х)+У (14.6) 


Thus we see that if the X-score is expressed as a deviation from the 
mean (X) of all N of the X-scores, the predicted value (Y) is the product 
of the line’s slope (b) times the X-score deviation (x) plus the mean (Y) 


of all N of the Y-seores. That is, 
fF = + E (14.02) 
Subtraction of Y from both members of (14.6a) gives 


r the prediction problem may be solved by fitting 


"WI i ok ‘ihe 
aen the rel: i is curvilinea! ў с т | 
КОШЫ Consideration of the curvilinear problem is beyond 


ae appropriate curve to the data. 
1e scope of this text. 


THE PREDICTION PROBLEM 419 


x = be (14.7) 
That is, the predicted value expressed as a deviation from the over-all 
Y-mean is simply the product of the slope (b) times the deviation of the 
X-score from the over-all X-mean. 
We shall next consider a different form of expressing the slope (b) of 
the prediction line. From (14.4) we may write 


р щш Улу MEX 
COE MVEz?, VZz?, МУу?; 


It will be observed that actually the value of b as given in (14.4) has 
simply been multiplied by an expression equal to unity and, hence, has not 


been changed in value. Now, if we rearrange the factors in the denominator 
we obtain 


bin. VES 
VES Уу, NES 


The value of the first factor in this result is r [see (13.3)] and if we 
divide numerator and denominator of the second factor by WN these terms 
become the Y and X standard deviations [see (6.5)]. Hence, 
8x (14.8) 


Now substituting this result into (14.6) and (14.7) we obtain 


(> Sy у v 
Fer (eo 
Macc SEE (14.9) 
or 
f-ríff..y (14.02) 
x 
and 
in Sy 
= (14.10) 


A particularly common form of the 


у со prediction equation is that given 
in (14.9). Now dividing through both mi 


embers of (14.10) by &y we obtain 


do nd 
$y $x 
or 
ap = rzy (14.11) 


Thus we see that the devi 


ation of the predicted value 
Y-mean in units of the over 


0 from the over-all 
-all Y standard deviation is sir 


nply the product 


420 


THE PREDICTION PROBLEM 


of r times the X-value expressed in z-score units. In other words, if we were 
to take the trouble to convert all Y- and X-scores into z-units by applica- 
tion of 


and if we then were to fit a line to the points of the scatter diagram of these 
z-values, using the least-squares criterion of “goodness-of-fit,” we would 
obtain a line whose slope (i.e., b-value) was r and the intercept (i.e., the 
c-value) of which was zero. 

Except for the fact that the predieted value as given by (14.11) is 
expressed in terms of a different scale, there is no difference in its meaning 
or interpretation. It still represents an estimate of a subpopulation mean. 
Now, however, the subpopulation scores involved are in terms of z-units, 
and the predicted score may itself be interpreted as a type of standard 
score, that is, as a deviation from a mean (Y) in units of a standard devia- 
tion (8y). It is not, strictly speaking, however, a z-seore, and because 
certain aspects of the argument are useful in another connection it will be 
instructive to note specifically why it is not. Suppose that for each of the 
N individuals in the sample we obtained a predicted score by means of 
(14.62). The sum of these N scores may be expressed as follows: 


SP, = 0+ XY 
or DY,=NY [see (5.12) and (3.21)] 


both members by N we obtain 
Ме=Ү (14.12) 


Now dividing 


That is, the mean of the N predicted scores (i.e., the Y's) is the same 
as the mean of the N actual Y-scores.. Hence, the faet that the deviations 
of the Y-values were measured from Y instead of M $ docs not violate the 
definition of a z-scorc. m 

Now suppose that for cach of the N individuals in the sample we obtain 
a predicted score in deviation form by application of (14.10) and that we 
square each such score. The sum of the squares of these deviation scores 


may be written. 


2 SY уло, [see (3.19)] 


$?y و„‎ 
825 e FM 8 
2: Sho 7282 (14.13) 


That is, the variance of the N predicted scores for the sample is the 


product of the square of 7 times the variance of the N actual Y-scores. 


421 


THE PREDICTION PROBLEM 


Since r is some value less than one (unless, of course, the relationship is 
perfect), r? is less than one, and it follows that the variance of the pre- 
dicted scores for the sample is some fraction (r?) of the variance of the 
actual Y-scores. Now to express the predicted scores in terms of a true 
z-scale, we should divide the deviations (i.e., the J's) by 8». But to obtain 
our 25—see (14.11)—we divided these deviations by a different value, 
namely, 8y. It is for this reason that the values yielded by (14.11) are not 
expressed in terms of a true z-scale. To remind the student of this fact, 
we placed the caret over both the z and its Y subscript in writing the left 
member of (14.11). 

In spite of the fact that the 25-values are not true z-scores, they do 
represent a distance from a mean in units of a standard deviation and, 
consequently, may be interpreted in much the same manner as ordinary 
z-scores.* Since the mean and standard deviation used are those of the 
actual Y-scores, the 25-values may, in fact, be interpreted as points on a 
z-score scale established with reference to the actual Y-scores. 

For the data of Table 14.3 (the problem of the high school counsclor) 
the values of the Y and X standard deviations and of r are 0.747, 1.735, 
and .660 respectively. For these data, then, (14.9) is 

Ў = (660) ED (X—X)4Y 
284 (X — 10.1) + 2.248 
-284X — 2.868 + 2.248 
-284X — .620 


Ш 


as before. 


Also for these data, (14.11) is simply 


25 = 6002, 
And for the advisce whose X-score was 13, 
EN ESSE 
بے‎ = L735 =+ 1.67 


and 
25 = (.660)(1.67) = 1.10 


That is, the estimated mean of a subpopulation of individuals whose 
zx-scores are all + 1.67 is above the obtained Y-mean by an amount equal 
to 1.1 times the obtained Y standard deviation. This, of course, is the value 
used by the counselor : 


à as an indication of the expected freshman-year 
college performance of this particular advisee. 


*The standard values of the mean and 
distribution of z-scores are zero anc 
values of the distribution of z+ 


standard deviation used with a true or ordinary 
l unity respectively. The corresponding standard 


scores defined by (14.1 1) are clearly zero and r. 


A22 


THE PREDICTION PROBLEM 


14.7 Tue Accuracy or PREDICTION: THE 
CORRELATION COEFFICIENT AS AN INDEX 


The solution to the prediction problem presented in the foregoing 
sections involves the use of an estimated subpopulation mean as the 
predicted or expected value of the Y-variate for an individual member 
of that subpopulation. Granting that we obtain a quite accurate estimate 
of the mean of the subpopulation involved, it is still possible that the use 
of this estimated mean as a predicted Y-value for a given individual may 
be grossly in error simply because of the fact that this particular individual’s 
status with reference to the Y-trait is considerably removed from the 
subpopulation mean. If the actual Y-score of the individual involved 
is near the subpopulation mean, then our predicted Y-score will be quite 
accurate. If, on the other hand, the individual happens to be one of those 
members of the subpopulation whose actual Y-score is either considerably 
below or above the subpopulation mean, then our predicted Y-score will 
involve a rather large error component. 

Clearly then, the successful application of this prediction procedure 
to an individual depends upon the likelihood that the individual’s actual 
Y-status is somewhere near the Y-mean of the subpopulation of which this 
individual is a member. Obviously, the likelihood of an individual's 
Y-score being near the mean of the subpopulation to which he belongs is 
a function of the variability of the Y-scores comprising the subpopulation. 
If these scores are all very much alike in magnitude, that is, do not vary 
markedly, then the Y-score for a particular individual cannot differ mark- 
edly from the subpopulation mean even if it is one of the more extreme 
(lower or upper) scores of the subpopulation. On the other hand, if the 
Y-scores comprising the subpopulation vary widely in magnitude, then the 
likelihood that the F-seore for a particular individual will deviate sub- 
Stantially from the subpopulation mean becomes much greater. 

Now the tendency of the Y-cores comprising the subpopulations to be 
concentrated about their respective means is reflected by the width of the 
elliptical boundary of the scatter diagram. When this boundary is narrow, 
the points for a subsample must necessarily lie close to the prediction line 
and the Y-values for these points cannot differ very markedly from Y. 
If the boundary is wide, then at least some of the subsample points must 
deviate rather markedly from the prediction line, and the use of Y (the 
point on the line which provides the estimate of the subpopulation mean) 
as a predicted score will result in rather gross errors in the case of at least 
some members of the subpopulation. We have previously seen that the 
Width of the elliptical boundary of the scatter diagram is a function of the 
degree of correlation between the X- and Y-variates. It follows, therefore, 
that the correlation coefficient may also be interpreted as an index of the 
accuracy with which our solution to the prediction problem may be applied. 


THE PREDICTION PROBLEM 423 


The larger the absolute value of r between X and Y—that is, the То; 
the relationship, be it positive or negative—the more accurate our predic- 
tions will be. 4 

The validity of r as an index of accuracy of prediction can be approached 
in another way. If the prediction of Y given X can be effected with perfect 
accuracy, then, of course, the predicted Y-score for any individual will be 
the same as his actual Y-score, and hence, for a given group of individuals 
the variance of the predicted Y-scores (%2 5) will be the same as the variance 
of the actual Y-scores (8?y). If, on the other hand, knowledge of X is of no 
help whatever in predicting Y, then the best prediction” we can make 
regarding any individual's Y-score is simply Y, the mean Y-score for. all 
the individuals in our experience pool. This amounts simply to an applica- 
tion of Rule 5.7 which states that the expected value of a score selected by 
some chance (random) procedure from a score distribution is the mean of 
the distribution. Since in this situation the predicted score will be the same 
for any individual, the variance of the predicted scores for a given group of 
individuals will be zero. As knowledge of X provides a basis for some 
differentiation in predicting Y, the variance of the predicted scores becomes 
some value greater than zero, and approaches that of the actual Y-scores 
as a limit, as the accuracy of prediction approaches perfection. This 
Suggests the following definition of an index of accuracy of prediction. 

DEFINITION. An index of the accuracy with which the prediction process 
may be applied to the individuals of a given group is provided by the ratio of 


p variance of their predicted Y-scores to the variance of their actual Y-scores. 
.@., 
62, 
Index of accuracy of prediction = —Ê (14.14) 
Фр 


But from (14.13) we see that the value of this ratio is r?, or that 


s (14.15) 


Thus again we find that the magnitude of the correlation cocfficient 
is indicative of the accuracy of the prediction process, or, in other words, 
that the accuracy of the prediction process is a function of the degree of 
correlation between X and Y. It is not surprising, then, that the placement 
of the prediction line is in part a function of r [see (14.8), (14.9), and 
(14.11)*]. 

There is still a third way 
accuracy of prediction can 
between the actual Y- 


in which the validity of r as an index of the 
be demonstrated 


- Consider the correlation 
-scores of the individu 


als of the experience pool and 


*When the Y- and. X-scores involved are expressed in standard 
r alone is sufficient to determine D 


-score form, the value of 
the placement of the line, 


424 


THE PREDICTION PROBLEM 


their respective predicted scores (Y-scores). Such an r is clearly an index 
of the accuracy of prediction, for the closer the agreement between the 
actual and predicted Y-values, the larger the magnitude of this r becomes. 
If the prediction is perfect, that is, if every predicted score equals the 
corresponding actual score, then the value of this r must be unity. Sim- 
ilarly if there is no relationship whatever between actual and predicted 
—that is, if the prediction process fails completely to yield accurate 
predietions—then the value of this r becomes zero. Intermediate values 
of this r are, of course, indicative of various degrees of relationship, that is, 
of various degrees of agreement, between predicted and actual scores. 

Now since the Y-values are obtained from the X-values by multiplying 
by a constant (b) and adding a constant (c)—see (14.2)— it follows from 
rules 13.2 and 13.3—see also (13.9)—that the correlation between Y and 
Y is the same as the correlation between Х and Y. That is, 


Typ =T a 

Hence, the remarks made regarding түр as an index of the accuracy of 

prediction apply to rxy. That is, rxy is indicative of agreement between 
actual and predicted values in precisely the sense of гүү. 


14.8 THE Accuracy OF PREDICTION: THE STANDARD 
ERROR OF ESTIMATE AS AN INDEX 


We have seen how the likelihood of gross errors in the application of 
the prediction process to individuals depends upon the variability of the 
actual scores for the subpopulations to which these individuals belong. 
This suggests the use of an estimate of the variance or the standard devia- 
tion of the actual Y-seores of a subpopulation as an index of the accuracy 
of the prediction process applied to its members—the smaller this variance, 
the more accurate the prediction process. The difficulty with this proposal 
lies in the fact that our experience pool may contain only a relatively few 
individuals from any given subpopulation so that any estimate of the 
variance of the Y-scores of this subpopulation may have to be based on a 
very small sample and may consequently involve a large sampling error. 
If, however, we can assume that the Y-seores for any one subpopulation 
have the same variance as those for any other- that is, if we can assume 
that all subpopulations of Y-seores are equally variable—then we can 
employ all the information contained in our experience pool to obtain a 
much more precise estimate of this common subpopulation variance. 

Before considering how such an estimate might be made in the case 
of the problem at hand, we shall first consider the more general problem of 
estimating the common variance of k equally variable populations having 


THE PREDICTION PROBLEM Z4 2 5 


different means given a random sample of n cases from each population.” 
Using only the sample from Population 1 we can estimate the population 
variance by application of (9.16). This gives 


У у: 
= i-a я v 
2 = Yu = ү: } 
: n—1l s ! : 


In this way we obtain a separate estimate of the common population 
variance from cach population sample. In each case, we must measure 
the score deviations from an estimate of the mean of the particular popula- 
tion involved, since the means of the populations differ. The estimates 
used for this purpose are, of course, the means of the particular samples. 

Now since each of these individual variance estimates is an estimate 
of the same population value, we can obviously obtain a more precise 
estimate of this value by averaging these individual estimates. That is, 


But kn, the number of samples (F) times the number of scores in each 
(n), is the total number of scores in all the samples. 


Then, 
kon 
ае eee Yi = Үз: 


Let this number be N. 


(14.16) 
Although the argument is beyond the scope of this text, it is possible 
to show that (14.16) is also applicable when the samples from the various 
populations differ in size as would ordinarily be true of the subpopulation 
samples in the prediction-problem situation. In this vase, of course, à 
subscript should be affixed to n. 

Now it would be possible in the 


(14.18) 40. аш н prediction-problem situation to use 
im -16) to obtain an estimate of the common subpopulation variance. 
owever, we can improve upon (14.16) in this situation by using more 


Precise estimates of the subpopulation means from which to measure the 


T НЕШЕ 


*The notational scheme is as 
dropped from the sample 
= nr. Also Y instead of X hı 


described in Section 3.9 except that the subscript may be 
ше each sample is the same size, i.e.. since m = na = s 
as been used to designate score values 


426 


THE PREDICTION PROBLEM 


Y-score deviations than are afforded by the subsample means. It is especi- 
ally important to take advantage of this possibility owing to the fact that 
in the prediction-problem situation many of the subsamples may be rela- 
tively small and their means consequently highly unreliable as estimates 
of subpopulation means. Therefore, instead of measuring the deviations 
of the Y-scores from their respective subsample means, we shall measure 
them from the more precise estimates of the subpopulation means which 
are provided by the points on the prediction line. This is the only modifi- 
cation we shall make in the computation of the numerator of (14.16). 


That is, instead of 


we shall use 


It will be observed that this sum is that which we previously designated 
by G and referred to as an index of the “goodness-of-fit” of a line to a given 
set of points [see (14.3)]. 

Mathematical statisticians have further shown* that when the devi- 
cores are thus measured, the appropriate denominator of 
— 2 rather than N — k. Hence, we have the following 

common value of the variances of the sub- 
n problem situation: 


ations of the Y-s 
(14.16) becomes 
formula for estimating the 
populations in the predictio 


ej; Yj Yy (14.17) 


cj; Yji— Y; (14.18) 


alue of б. given by (14.18) may be thought of as an estimate 


The у 
of the standard deviation of the actual Y-scores for any subpopulation of 
individuals whose X-scores are all of the same magnitude. Of course, to 


interpret it in this way requires that we assume the variability of the 
Y-scores of any subpopulation to be the same as that of any other. This 
condition is referred to as homoscedasticit y. 


MÀ 


"Proof is beyond the scope of this text. 


THE PREDICTION PROBLEM 427 


DEFINITION. | f the Y-scores of any subpopulation of individuals making 
a given X-score have the same degree of variability as those of any subpopulation 
of individuals making any other given X-score, then the condition of homo- 
scedasticity is said to hold. 


When G,., is interpreted as an estimate of the standard deviation of а 
subpopulation of Y-scores each of which is paired with the same X-score, 
it may be regarded as an index of the accuracy with which the prediction 
process may be applied to individuals. Since the "estimate" or “predic- 
tion" we make for any member of a particular subpopulation is the Y-value 
for that subpopulation, the Y — Ў deviations represent differences between 
actual and "estimated" or “predicted” values and hence are measures of 
the error in the “estimation” or prediction.* The value of Sj. is the 
square root of a sort of mean value of the squares of these errors for the 
individuals in the experience pool. The larger these errors, on the average, 
the larger the value of 2... It is for this reason that the value Fy. is 
known as a standard error of estimate or more fully as the standard error of 
estimating Y for a given X. 

It is possible to use an estimate of the standard deviation of a sub- 
population of Y-scores as an index of the accuracy of individual estimates— 
that is, as a standard error of estimate—even if the condition of homo- 
scedasticity does not hold. In this case, however, the use of (14.18) is not 
appropriate and the estimate of the subpopulation standard deviation will 
necessarily have to be based on only those values comprising the subsample 
from that subpopulation. The magnitude of such a standard error of 
estimate will, of course, be meaningful only with reference to the particular 
subpopulation and a separate determination must be made in the case of 
each subpopulation. If the point field of the scatter diagram is elliptical, 
it is generally safe to assume that the condition of homoscedasticity holds. 
Figure 14.6 shows the boundaries of two hypothetical point fields, А and 
B, for which the condition of homoscedasticity does not hold. In the сазе 
of the A-plot, the standard errors of estimate will, obviously be much 
greater for the subpopulations of Y-values which are associated with small 
X-values than for the subpopulations associated with large X-values. 
Here predietions made for individuals having small X-scores are likely to 
involve gross errors whereas those made for individuals having large 
X-scores will, on the whole, be quite aceurate. The situation is reversed 
in the case of plot B. 

To facilitate the computation of 8?,., We shall de 


^ rive a formula for the 
sum of squares in the numerator of (14.17), Note first that the operator 
nj à i 


È directs the summation of the e?-values for the individuals of the J- 
=1 Б 


"It is for this reason we have designated the Y-f difference by e 


428 


THE PREDICTION PROBLEM 


k 
subsample. The operator > directs the summation of all such subsample 
gait 
sums. This amounts simply to obtaining an c?-value for each of the N 
individuals in the entire experience pool and summing these № values. 
Hence the double summation operator employed in (14.17) may be replaced 
N 


by the single operator > ,orsimply by ©, where it is understood that all 


i=l 
N of the e-values are involved in the sum. 


Y-Scale 
Y-Scale 


X-Scale 


X-Scale 


Boundaries of two hypothetical scatter-diagram point fields 


Figure 14.6 806 
1 of homoscedasticity does not hold 


(А and B) for which condition 


Now for any individual in the experience pool, say individual 7, 


42 Уг Y; NY 5 

: — ys ТЕР [substituting for Y from (14.2)] 
5 y. ре FOX [substituting for c from (14.5)] 
= (у, Р) =: X) 
=yi— bri, 


е, = y?; + 021; — 2000: 
Now an expression like this сап be obtained for each of the N individuals, 
and summing these V expressions, we obtain 


2d У 
Хе; = Zy?i + 025122; — 2bZriyi 


ae 


[see (3.19)] 


But from (14.4) we see that 
Day, = 000°: 

Hence, substituting for Ex; we obtain 

Жу; + 212, — 20?2x?; 


or 
ir Zr? (14.19) 


429 


THE PREDICTION PROBLEM 


Or if we substitute from (14.4) for b we have 


(14.20) 


Or if we multiply numerator and denominator of the last term of the right 
member of (14.20) by Zy;? we obtain 


Ze?; = Èy? — r?Zy?; [see (13.3)] 
or А 
Ee? = Ey?(1 — r?) (14.21) 


Either (14.19), (14.20), or (14.21) may be used to compute the value 
of the error sum of squares, Ze*;. The application of these formulas may be 
illustrated using the data of the problem of the high school counselor (see 
Table 14.3). In this problem Yy?;= 27.8848, Ex?; = 150.5, and b = .284. 
Hence, application of (14.19) gives: 


Ze; = 27.8848 — (.281)?(150.5) = 15.746 
Also for these data Zv;y; = 42.76. Hence, application of (14.20) gives: 


. (42.70)? 


$22.—97 
Хе?, = 27.8848 150.5 


= 15.736 
Finally for these data r? = .4357. Hence, application of (14.21) gives: 
Ze; = 27.8848(1 — 4357) = 15.735 


The differences in these results are due to rounding errors. Of the three 
formulas, (14.19) is usually the most subject to rounding error. 
If we divide both sides of (14.21) by N we obtain 


$2,,,— 8?,(1— r2) (14.22 
or 
#у. = ФУМ 1 — F2 (14.23) 


where $. is the standard error of estimate for th 
of N individuals. Some writers refer to Sp.z as the st; 
and advocate its use as an index of accuracy 
process, however, is not needed for use wi 


pool but rather for use with members of th 
Y-scores are unknown. 


e given experience pool 
andard error of estimate 
of the prediction process. This 
th members of the experience 
е subpopulations whose actual 

It would appear, therefore, that the population 
estimate of the standard error of estimate given by (14.18) provides a more 
realistic assessment of error in the prediction process than does the sample 
value of (14.23). If we d 


ivide both members of (14.21) by N — 2 instead 
of N we obtain 
ae Ey? 
ey yt = 7) 


430 


THE PREDICTION PROBLEM 


That is, 
(14.24) 
or 


(14.25) 


Again using the data of the problem of the high school counselor, applica- 
tion of (14.22) gives 


and 
буг = 06 
Application of (14.24) gives 
50! xcu 


y= 18 (.5577) (1 — 4357) 


e 


50 а 
= — (.3147) = .3278 
18 (.3147) 7 
and 
Cy2= i07 
Since will always be some value greater than one, it follows 


that the population estimate of the standard error of estimate will be 


larger than the corresponding sample value. Of course, as V becomes large 


NS approaches one and the need for distinguishing between Fy. 


and &,., becomes of little practical importance. 

Since we have already shown how the accuracy of the prediction process 
is a function of the degree of correlation between the two variates involved, 
it is not surprising to find that the standard error of estimate is also a 
function of this correlation. However, the standard error of estimate is 
also a function of the over-all Y standard deviation (8) in such a way 
as to give to it an advantage not possessed by the index r. We have seen 
(Section 13.10) how r is affected by the “range-of-talent” encompassed by 
the collection of individuals involved. If the use of &,. or ©. is appro- 
Priate at all, that is, if the condition of homoscedasticity holds, its value 
is independent of the "range-of-talent" in the experience pool If the 
"range-of-talent" is inerensed, r tends to increase, making Vl — r? de- 
crease. In this case, however, the value of sy will also increase so that the 
standard error of estimate is the product of an increasing and a decreasing 


THE PREDICTION PROBLEM 4 31 


value [see (14.23) and (14.25)] and hence tends to remain constant for a 
given pair of variates. The student can perhaps better visualize this 
property of the standard error of estimate if he recalls that r increases as 
the point field of the scatter diagram becomes elongated in relation to its 
width, whereas the magnitude of d ,.. reflects only the width of this point 
field without regard to its length. Figure 14.7, for example, shows the 


CD =С'р' = гапде of 
Y-scores for a given X 


C'D’=CD =range of 
Y-scores for a 
given X 


C. 


Y-Scale 
Y-Scale 


r small 


X-Scale X-Scale 


Figure 14.7 Boundaries of two hypothetical scatter-diagram point fields 
of same width but differing in length 


boundaries of the scatter-diagram point fields for two imaginary experience 
pools involving the same variates. Boundary A applies to a collection 
limited in "range-of-talent," whereas Boundary B applies to a collection 
involving a much more extensive "range-of-talent." The value of r will be 
much greater for the B than for the A collection. However, the variation 
among Y-scores for given values of X tends to be the same for both plots 
as ix shown in the diagram by the ranges CD and C’D'. It is this variation 
which is reflected by @,.; and the value of 8 ,., will consequently be the 
same for both plots. 

The fact that ¢,., is relatively independent of "range-of-talent" has 
led some writers to advocate its use in preference to r as an index both of 
degree of relationship and accuracy of prediction. However, it, too, has a 
disadvantage as an index in that it is expressed in terms of. Y-scale units. 
This makes it impossible to use бу. to compare the relative effectiveness 
of two or more prediction situations unless the Y-scales are comparable. 
The correlation coefficient, r, on the other hand, has the advantage of being 
an abstract number independent of the units of measurement and, hence, 
may be used as a basis for comparing the accuracy of different prediction 
situations even though the units involved are not comparable. 

| It will be instructive to study the relationship between 7 and 8,.2 fora 
given "range-of-talent," that is, for a given value of 8y. It is clear from 


432 


THE PREDICTION PROBLEM 


(14.23) that VI — r? is the proportion or fraction that the sample standard 
error of estimate &,., is of the Y standard deviation (sy). When r is zero. 
this proportion is one, that is, the standard error of estimate equals the Y 
standard deviation, and there is no improvement in the accuracy of pre- 
diction resulting from knowledge of an individual's X-score. As r increases, 
this proportion gradually decreases, becoming zero when r= 1. A standard 


100 


90 


80 | 


ан И || 


¥ 
a 
E- 
5 60 
5 RIE 
3 50 
E | 5 
ki 40 i 
©` 

30 za 

4i 
20 
10 
0 0-20 "30 40 .50 .60 70 .80 .90 1.00 


r-Scale 


Figure 14.8 Percentage of reduction in “ave rage" magnitude of 
prediction errors for various values of r 

error of estimate of zero, of course, is indicative of errorless predictions. 
Values of this proportion between one and zero are indicative ot the extent 
to Which errors of estimate or prediction are reduced from their maximum 
as indicated by sy to zero as a result of taking into account information 
about the individuals’ Y-scores. For example, if the correlation between 
Y and X is .80, this proportion (V1— .802) 1х .6 indicating that Byer is 60 
ber cent of the maximum value which it would have heen were л = 0 instead 
of .80. That is, as a result of taking into account information about X 
When r= .80, we reduce the “average’™* magnitude of the prediction errors 
by 40 per cent (100 — 60 = 40) over what they would be were we to ignore 


this information. Figure 1.8 shows graphically the percentage of reduction 
. Fig 


en O M MER SES 
lard deviation is a sort of "average " 


"Average in the sense that a stan 


433 


THE PREDICTION PROBLEM 


in the "average" magnitude of the estimation or prediction error which is 
associated with various values of r. Inspection of this figure clearly shows 
that r must become quite large before an appreciable percentage of reduc- 
tion is achieved. An r of .50, for example, reduces the "average" error of 
estimate by only about 13.4 per cent, and an r of almost .98 is necessary to 
bring about an 80 per cent reduction. It is for this reason that some writers 
have advocated that the prediction or estimation process under considera- 
tion should be employed only when the correlation between X and Y is 
very high—say .90 or higher. It may indeed be advisable to avoid making 
such predictions when r is low if it is possible to do so. Frequently, however, 
a prediction cannot be avoided. If this is the case, it is better to make use 
of the information about X than to ignore it, regardless of how slight the 
gain in accuracy may be, that is, regardless of the fact that the correlation 
between Х and Y may be quite low.* It is of utmost importance, especially 
in such circumstances, that the predictor be fully cognizant of the fallibility 
of the procedure and that he interpret his results accordingly. 

In concluding this section attention is directed to the relationship 
between the variance of the Y-scores comprising the experience pool and 
the variances of the errors of estimate and of the predicted scores. From 
(14.8) we sce that 


or 


< 
tá 
Es 
] 


[see (14.13)] 


Ze; = Dy?; — 2], 
or 
= Хр, Хе?, (14.26) 


If we divide both members of (14.26) by N, we obtain 


T (14.27) 


That is, the variance of all the Y-scores in the experience pool is made up 
of two component variances: (1) the 


variance of the corresponding esti- 


*In situations involving the selection of 
large number of individuals (e.g. in selecting fror 
attend a service academy), a considerable gain m: 
tion test even though the correlation between suec 
(say .30 or even .20), if the number of potentially 
tion to the total group. The inte 
J. P. Guilford, Fundamental Stati 
MeGraw-Hill Book Company, Ine., 
of Psychological Testing (New York: 


а relatively small group of individuals from a 


1 among GI personnel individuals to 
г be made through the use of a selec- 
and the selection test is quite low 
tecessful individuals is small in r 
ted student will find a presentation of this point in: 

in Psychology and Education (3d. ed.; New York: 
1956), pp. 379 f.; and Lee J. Cronbach, Essentials 
Harper & Brothers, 1949), pp. 256 f. 


434 


THE PREDICTION PROBLEM 


mated Y-values (F's), and (2) the variance of the errors in these estimated 
values (e = Y —Y). 


14.9 THE CONCEPT or REGRESSION 


The prediction equation in standard-score form [see (14.11)] indicates 
clearly that for any situation in which r is less than one, an estimated or 
predicted value deviates by a lesser amount from the over-all Y-mean in 
units of the Y-xtandard. deviation than does the corresponding -Y-value 
from the over-all X-mean in units of the X-standard deviation. By way 
of illustration, Figure 14.9 shows the point-field boundary of а hypo- 


zy 


14.9 Diagram showing how estimated zy-mean 


FIGURE п т 
for subgroup of individuals whose zy-scores are 2.0 is 


necessarily less than 2.0 


а which the variates are expressed in z-score 
sumed to be about .75. Points falling in the 
heavy black column have zx-score values of + 2.0. The estimated zy- 
Mean of this subpopulation is the corresponding point on the prediction 
line (2р = rzy = .75 X 2.0 = 1.5). Since r is less than one, this point will 
Necessarily be nearer the origin (the intersection of the axes) than the 
?x-value of +2.0. The origin, or point at which zx and zy both equal zero, 


locates the means of the 2x- and z;-scales, and the unit-values of these 


Scales are one standard deviation. Hence, it follows that the mean of the 


thetical scatter diagram it 
form and in which r is as 


435 


THE PREDICTION PROBLEM 


Y-scores of any subgroup of individuals making the same score lies 
closer in terms of standard deviation units to the general Y-mean than does 
the particular X-score value to the Y-mean. This tendency for the sub- 
group Y-means to regress toward the general or over-all Y-mean is known 
as the regression effect or the phenomenon of regression. 

While regression effect is mathematically inherent in our solution to 
the prediction problem, it is in no sense an artifact of that solution. The 
phenomenon is one we have all observed in the "real world," but which we 
have seldom attempted to describe in quantitative terms. Suppose, for 
example, that we consider a group of adults all of whom are 6 feet 6 inches 
tall. We would also expect to find these individuals to be above average in 
weight, but we would hardly expect them on the average to be as extreme 
in weight as in height. Or suppose that tests in general mathematical 
ability and in knowledge of contemporary affairs are administered to all 
freshmen in a large university. Now, if from the total group we were to 
select a number of individuals because they were very outstanding in 
their performance on the mathematies test, we would find that, while most 
of these individuals would be above average in knowledge of contemporary 
affairs, only a few of them would be as far above average in this knowledge 
as in mathematical ability. That is, the mean score for these selected indi- 
viduals on the contemporary affairs test would be lower (when the scores 
are expressed in comparable terms, such as z-scores) than their scores on 
the mathematics test. This phenomenon of regression is characteristic of any 
two linearly correlated variates. 

A further graphic representation of this phenomenon will be helpful in 
arriving at a more exact understanding of its character. The two frequency 
curves in Figure 14.10 represent the distributions of measures of perform- 
ance on a scholastic aptitude test and of subsequent success in college for 
the same large group of individuals. Both distributions are plotted along 
comparable (z-score) scales. The X-distribution represents the distribution 
of aptitude test scores, and the Y-distribution that of success in college. 
We have assumed a correlation of .66 between these two variates since this 
was the value previously used in the problem of the high school counselor. 
Now consider a subgroup of individuals, all of whom make scores of +20 
on the aptitude test. The estimated mean of the college-suecess scores for 
this subgroup is (.66)(+ 2) or + 1.32. A heavy line has been drawn from 
+ 2.0 on the aptitude scale to + 1.32 on the success scale. Note that this 
line points inward toward the middle of the success distribution, That is, 
the mean of the success scores for the members of this subgroup lies closer 
to (has regressed toward) the general mean of the success distribution than 
does their aptitude score (+ 2.0) to the general mean of the aptitude dis- 
tribution. Of course, there will be considerable variation in the success 


scores of the members of this subgroup. In fact, the standard deviation 
(standard error of estimate) of their success scores w 


ill be only about 25 


436 


THE PREDICTION PROBLEM 


Aptitude (X) College Success (Y) 


-+3 


r=.66 
For zy=2 
2;=2Х.66=1.32 


[2-9 
Figure 14.10 Diagram illustrating phe 
for a subgroup of individuals making the 


success distribution (see Figure 14.8). 
scores 


nomenon of regression of Y-scores 
same X-score 


per cent smaller than that of the total 
This implies that the standard deviation of the subgroup suec 
will be .75 [see also formula (14 zj]. A hypothetical distribution curve 
res having а mean of 1.32 and a standard 
deviation of .75 has been sketched into Figure 14.10.* The large dots are 
spaced at a distance of one standard deviation (.75) and the lines fanning 


out from the aptitude score of 2.0 are intended to help the student pieture 
how different. individuals making this particular aptitude score make 
different suecess scores, While a few indiv 


iduals make success scores above 
the level of their aptitude score, most 0 


for these subgroup success sco 


f them obviously achieve success 
Scores of less than 2, that is, success scores below the level of their aptitude 
score.t In other words, there is an over-all regression effect when the sub- 
group is considered as а whole. 


*This distribution curve agnified”” in relation to the rest of the figure. 
Theoretically its total area should be the same percentage of the area of the whole Y- 
distribution that the number of individuals having 2x-scores of 2 is of the whole X-dis- 
а to be normal and the z-values determined 


tribution. If these distributions are assumed 
to the nearest 10th, this is approximately only one-half of one per cent. 


Hf we assume the subgroup success scores to be normally distributed 


, 1:398. sat PIS = 82. 


has been “highly 1 


Лә 
That is, the success scores of some g2 per cent of the members of this subgroup will be 


below the level of their aptitude scores. 


437 


THE PREDICTION PROBLEM 


This picture suggests what would be found in the distributions of any 
two positively and linearly related traits for any group. If the relationship 
between the traits is perfect (i.e, if r= 1), then, for a subgroup of indi- 
viduals the lines which join their common X-trait standard score to their 
Y-trait standard scores will merge into a single horizontal line, since in 
this case each individual's Y-trait standard score will be the same as his 
X-trait standard score. If the relationship is high but not perfect, these 
lines will spread apart forming a relatively narrow fan, and the heavy line 
(i.e., the line to the Y-trait mean for the subgroup) will be deflected (will 
regress) only slightly toward the middle general mean) of the Y-dixtribu- 
tion. If the relationship is very low but positive, these lines will fan out to 
nearly all parts of the Y-distribution, and the heavy line will point more 
sharply into the middle of that distribution. If the traits are wholly un- 
related (i.e., if r=0), the lines will fan out through the whole of the 
Y-distribution, and the Y-mean for the subgroup will coincide with the 
general Y-mean. For example, if for a population of sixth-grade boys the 
X-trait is height and the Y-trait intelligence* and lines are drawn from a 
score interval near the lower end of the height distribution to the positions 
of the corresponding intelligence scores, these lines would spread throughout 
the entire intelligence distribution around a subgroup mean which would 
coincide with the general mean in intelligence. This is the same as saying 
that short persons are just as variable in intelligence and have the same 
average intelligence as tall persons, or, for that matter, ax the population 
in general, regardless of differences in height. 

If the relationship between the two variables is negative, the majority 
of the lines from any one score interval in the X-distribution will go to the 
opposite half of the Y-distribution, as will the heavy line e 
subgroup Y-mean. This subgroup mean will, nevertheless 
the general Y-mean than the X-score 

In general, then, the higher the de 


xtending to the 
still be nearer 
interval is to the general X-mean. 

gree of correlation, the narrower will 
be the fan-shaped pattern of lines drawn from score intervals in the X- 
distribution to the corresponding scores in the Y-distribution, and the more 
nearly horizontal will be the heavy line dr: 
that is, the less will be the regression. Ne 
ship is not perfect, this heavy line 


awn to the subgroup Y-mean— 
vertheless, as long as the relation- 

will point inward, however slightly. In 
other words, for individuals selected from a given group because they are 
alike in one trait, the mean value of a second related trait will regress 
toward the general mean of the second trait. The amount of this regression 
15 Inversely related to the coefficient of correlation between the two meas- 
ures. With perfect correlation there is ho regression 
the regression is complete, that ix, the } 
the general Y-mean, 


With zero correlation 
subgroup Y-means coincide with 


"The correlation between these traits is "üpproximately zero 


438 


THE PR EDICTION PROBLEM 


14.10 REGRESSION TERMINOLOGY 


We have seen that if the variates are expressed in z-score form the 
Prediction line provides estimates of the means of zy-scores for subpopula- 
tions of individuals making the same zx-score. Clearly, then, in this 
situation the prediction line indicates the degree of regression along the 
fy-seale that is associated with a given zy-value. This being the case it 
Would seem reasonable to call such a line a regression line. For this reason, 
in fact, it has become customary in statistical literature to refer to all prediction 
lines as regression lines and to all prediction equations as regression equations. 
This terminology is employed even in situations involving curvilinearly 
related variables—situations in which the concept of regression developed 
in the preceding section is not meaningful. In conjunction with the use of 
this terminology there has evolved a related terminology applicable to 
other aspects of the prediction problem. Because this regression terminol- 
Оңу is so widely used, it is important that the student be familiar with it. 

To start si the beginning, it is customary to refer to the topie of this 
chapter as the regression problem instead of the prediction problem. Then, 
as we have already indicated, equation (14.2) and other forms of it such as 
are given by (14.6), (14.62), (14.7), (14.9), (14.92), (14.10), and (14.11) are 
known as regression equations instead of prediction equations, and the lines* 
represented by these equations are known as regression lines rather than 
Prediction lines. Since it is actually the slope of the line that indicates 
regression, and since the slope is given by the value of b, it has become 
Customary to refer to b as a regression cocfficient or a regression weight. The 
Predicted values are sometimes called regressed values, though they are also 


Commonly referred to as estimates or predictions. | 
The sum of squares of the deviations of the regressed (predicted) values 
from the general Y-mcan for all members of the experience pool is known as the 
regression sum of squares or the sum of squares due to regression. This sum 
Of squares is often denoted symbolically by взу. I.e., 
Sro = X б Р } (14.28) 
The sum of squares of the deviations of the actual Y-scores from the 
general Y-mean for all members of the experience pool is known as the total 


Sum of squares, and is often denoted by ssr. Le., 
ssp= lyn =Y Y (14.29) 

The deviations of the actual (Y) from the regressed (Y) values are some- 
times} called vestihidls instead of errors of estimate, since they represent that 
Part of the total (Y — F) deviation which would remain were the regression 
deviation (f — Y) to be subtracted from it. (For an example, see Figure 
ee 


^ : т 
E “Xeept for a change in metri 
The term "error? which we h: 


it is the same line which is involved. 
ee previously employed is also used by many writers, 


439 


TE PREDICTION PROBLEM 


7 Point (X, Y) 


x Total Deviation (Y-Y) 
Residucl Deviation (Y-Y) 


Y 


xI 


Y-Scale 


“ Regression Deviation (Y-Y) 


X-Scale 


Ficure 14.11 Diagram showing total deviation as consisting of 
regression deviation and residual deviation 


т 5 Ane A : E 
14.11.) The sum of the squares of the residual deviations is sometimes 
called the residual sum of squares and is denoted by в,а. Le., 


$8,,7 Me? ep = Үг Ў; (14.30) 
Translating (14.26) into terms of this notation we have 
SST = SSreg + SSres (14.31) 


It is actually a common practice to compute ss, as a residual, that is, as 
the difference between ssp and SSreg 
From (14.17) we sce that 


CE tS (14.32) 


Or for just the individuals comprising the experience pool at hand we 
have 


Su (14.33) 
Also, since 
B. Bom 
$*y- А? 


апа 


ssr (14.34) 


*It is also often called the error sum of squares 


440 


THE PREDICTION PROBLEM 


Note also that the G-value of (14.3) is sss. That is, in regression 
terminology, our criterion for the placement of the regression line is that 
placement which minimizes the residual (error) sum of squares. Since ssz 
is a constant for a given experience pool, and since ssr is the sum of ss, 
and взу, it follows that our method of placing the line is one which maxi- 
mizes the regression sum of squares. 


14.11 PREDICTION or X, Given Y 


Tn the foregoing sections we have represented the so-called independent 
or predictor variable by X and the dependent or predicted variable by Y. 
The situation is usually such that only one of the variates can properly be 
regarded as the variable to be predicted. This variable, of course, should 
always be designated as the Y-variable. 

If for some reason it is desired to predict the value of a first variable 
from information about a second as well as that of the second from informa- 
tion about the first, the theory of the foregoing sections still applies. Now, 
ү to solve the problem twice, once with the first 
» and once with the second variate as the Y-variate. 
For example, suppose the variables involved are intelligence and reading 
ability. Ordinarily, we would be concerned with prediction of reading 
n about intelligence and we would solve the problem 
Y and intelligence as X. Occasionally, how- 
individual for whom no intelligence score 
estimate of such à score was badly needed 
—perhaps for quite some other purpose. If knowledge of this individual's 
use our n uding-ability and intelligence 
f his intelligence. Such use, however, 


however, it is necessi 
variate as the Y-variate 


ability given information 
designating reading ability as 
ever, we might encounter some 
Was available and for whom some 


reading score is available we may 
experience pool to obtain an estimate о ; dns \ 
requires that we solve the problem а second time, designating intelligence 
ds the Y-variate. е 

Actually it ix a simple matter to restate the theory of the foregoing 
sections with the dependent variable designated һу. X instead of Y. After 
all, the choice of designation is purely arbitrary. Го translate all Our PIGS 
Vious results into terms of a solution in which the dependent (predicted) 
Variable is assigned the label X and the independent (predictor) variable 
the label Y, it is necessary only to change all the Ys of our previous sults 
to. Ws und-all dhe X's to: ES For example, we have seen (14.4) that the 


Value of b in (14.2) is given by 


Now if we wish to translate this formula into a form that is appropriate 
for use with X instead of Y as the dependent variable we simply write 


441 


THE PREDICTION PROBLEM 


У 
= LY: 4.35 
x= Zy?; ын) 


Similarly the c of (14.2) for predicting Y is given by (14.5) as 
ey Y—byX 


To translate this formula into a form that is appropriate for use with X as 
the dependent variable we write 


cx =X —bxY (14.36) 


Note that the b-value in (14.36) is that given by (14.35) and not that given 
by (14.4). Now using (14.35) and (14.36) we obtain the following predic- 
tion equation for predicting X, given Y. 


X=bxY¥ +ex (14.37) 


All other results (formulas) of the foregoing sections may be similarly 
translated into terms of X as the dependent and Y as the independent 
variable by this simple interchange of the X and Y symbols. 

It is important that the student appreciate the fact that the regression 
line for predicting X is an entirely different line from the one for predicting 
Y. That is, the X-regression equation cannot be derived from the Y- 
regression equation simply by solving the latter for X in terms of Y. The 
equation resulting from such a solution is still the equation -though in a 
different form—for the Y-regression line. That is, it is still the locus of 
Y-means made by subgroups of individuals making the same X-score. To 
predict X, we need the locus of the X-means made by subgroups of indi- 
viduals making the same Y-score. These subgroup X-means are not located 
along the same line as the subgroup Y-means.*. In other words 
regression or prediction line is used to predict X, given Y, th 
predict Y, given X.f This different line is the 
(14.36), and (14.37). 


‚а different 
an is used to 
one given by formulas (14.35), 


*Unless, of course, the correlation is perfect. 


tIn this connection it is suggested that the student re Secti 
< $ nt reread Sec 3.6 ing particularly 
itur 13.7 and Table 13.19. eread Section 13.6 noting partic ularly 


442 


THE PREDICTION PROBLEM 


15 


SAMPLING-ERROR THEORY 
FOR LINEAR REGRESSION 
AND CORRELATION 


15.1 INTRODUCTION: THE REGRESSION MODEL 


In this chapter we shall treat briefly some simple sampling-error theory 
regarding b and r and show how this theory may be applied both to the 
testing of certain statistical hypotheses about b and r and to the determina- 
tion of interval estimates. We need first to describe the models to which 
this sampling theory applies. In this section we shall describe the model 


for the prediction or regression situation. 
We shall begin with a review of the nature of the data at hand. We 


Presumably have an experience pool of N individuals for each of whom 
буо scores, Х and Y, are available. The X-score is the independent or 
Predictor variable and the l-score the dependent or "to-be-predieted" 
Variable, Not all N of the X-scores are different in magnitude. Assume 
that in all there are & different X-values, where k is some integer less than 
N. Let these values be designated Ап, Хә, +. Xj oos Хь, and let the 
number of X,-scores be represented by nı, the number of X2-scores by ns, 
ete. Then 
k 
Уљ= № (15.1) 


j=l 


SAMPLING THEORY FOR REGRESSION AND CORRELATION 443 


Now the n; individuals making X-scores of magnitude X; do not all 
make the same l-score. We shall regard the пу scores that comprise the 
subset of Y-scores made by these individuals as having been selected at 
random from a subpopulation of Y-scores for individuals whose X-scores 
are all of magnitude X;. Similarly, we shall regard the n» scores that com- 
prise the second subset of Y-scores as a random sample from a subpopula- 
tion of Y-scores for individuals whose X-scores are all of magnitude №. 
We shall regard all the remaining subsets of Y-scores associated with like 
X-scores in a similar fashion. Thus, we consider our experience pool as соп- 
sisting of random subsamples of Y-scores, cach having been selected from 
a different population which is characterized by the fact that all its mem- 
bers have the same X-score. 

Now in solving the prediction problem we made one further assumption 
regarding the nature of these subpopulations. Namely, we assumed their 
means to fall on a straight line. The form of the equation of this line which 
we shall consider here is that given by (14.6). That is, 


Y=W(X—NX)+Y=br+ Y (15.2) 


In this form of the equation, the independent variable X is expressed 
as a deviation from X, b is the slope, and Y the Y-intereept. To obtain this 
equation it is necessary only to apply (14.4) and (5.1) to the data in the 
sample experience pool to determine the values of b, X, and Y. 

Suppose now that we repeat the procedure with a second sample ex- 
perience pool selected by choosing at random ту Y-scores from the sub- 
population of individuals whose X-scores are of magnitude Xi, no Y-scores 
from the subpopulation of individuals whose X-scores are of magnitude 
Xs, ete. Then the X-scores for the N individuals comprising this new 
experience pool will be the same as before and consequently the value of X 
will remain the same. Owing to the operation of chance, however, the 
Y-scores comprising the subsamples will differ somewhat from those of the 
original experience pool. Hence, the new values of b and У may be expected 
to differ from those previously determined. i 

Now assume this particular sampling procedure to be repeated in- 
definitely. Each repetition will, of course, result in the same X-value. 
The values of b and F, however, will vary from sample to sample. The 
relative frequency distributions of these b- and Y-values are the sampling 
distributions of the statistics b and F (sce definition, page 240). These 
sampling distributions represent the theoretical totality of experience with 
о Муде experience pools selected from 

: Mie Ds in the manner described. Ву im- 
posing two additional conditions on the nature 3 
s T de pes ela Hone mathemat ical Curves which serve as models 
abes: i 18 utions of band Y. These conditions are: (1) homo- 
seedasticity, 1e., equal variability of Y-scores from subpopulation (0 


of the subpopulations of 


444 


SAMPLING THEORY Е 
IPLING THEORY FOR REGRESSION AND CORRELATION 


subpopulation*; and (2) normality of the subpopulation Y-score distribu- 


tions. 

We shall now summarize the foregoing description using a different 
order of presentation. That is, we shall begin with a statement about the 
population. The total population consists of k subpopulations of indi- 
viduals for each of whom there is an X-score and a Y-score. The X-scores 
differ in magnitude from subpopulation to subpopulation but are of the 
same magnitude for all individual members of any given subpopulation. 
The Y-seores differ in magnitude from individual to individual within the 
same subpopulation. The subpopulation Y-scores are all (1) normally 
distributed, (2) equally variable, and (3) have means falling on a straight 
line. This third condition may be stated symbolically as follows. Let 
Шу. represent the ¥-mean of a subpopulation. Then 


шу: = br F иу (15.3) 


The sample experience pool is formed by selecting at random nı indi- 
viduals from the first subpopulation, n2 individuals from the second, ete. 
The theoretical repetitions of this sampling procedure all involve the 
random selection of these same numbers of individuals from the same 
subpopulations. Thus, it follows that the over-all X-score distributions 
are the same for all repetitions of the sampling routine. Only the Y-scores 

and, hence, only those statistical indexes 
score values are subject to sampling error. 
This implies that tests of any statistical hypotheses, or that any statistical 
le experience pool properly apply only to that 
total population which is composed of the particular К subpopulations from 
presumed to have been selected. М 
hod of sampling is not too appropriate be- 
al applications the individuals com- 
simple random sample 


Vary from sample to sample 
(b, Y, and r) which involve 1 


estimates, based on a samp 


Which the subsamples are 

It may appear that this met ‹ 
cause of the fact that in most practic 
pool are selected as a 
thus making the X-eharaeteristie as well as the 
andom sampling fluctuation. Actually, how- 
ir concern is with the estima- 


Prising a sample experience 
from the total population, 
V-characteristic subject to r 
ever, in the prediction (regression) problem, оц d c 
tion of subpopulation Y-means. 1, as assumed, these se do he on a 
Straight line, the placement of this line is independent pls асла 
subpopulations involved. In fact, samples from any two af nee gon ide 
Us with a basis for estimating the placement of the line. How ше реа 
subpopulations to be studied are selected is, then, not a matter о practical 

linearity, they may be selected arbi- 


Concern. Granting the condition of may Br 

(таму or at random as, in effect, is the case when individuals comprising 
à at та as, 4 ues 

the sample pool are selected at random from the total population. 


*This can NIS ‘ously imposed in connection with the use of the standard 
us сопан s that previously m] й ee чч ТЕА 4 
error of c ES Ms D u T e es of the accuracy of the prediction process. (See Section 14.8; 
estimate as a A 54 
: ONG age 428. 
note particularly the definition on page 42 ) 


ESSION AND CORRELATION 445 


SAMPLING THEORY FOR REGE 


15.2 THE SAMPLING DISTRIBUTIONS OF b AND Y 


In this section we shall simply present, without mathematical—or, 
for that matter, intuitive— justification, the descriptions of the theoretical 
sampling distributions of b and Y as they have been determined by mathe- 
matical statisticians. It will be possible for the student to apply this theory 
in testing hypotheses and in making interval estimates even though he is 
not prepared to understand its mathematical basis. 

Given an infinity of sample experience pools selected from a set of 
subpopulations of the type described and in the manner described in the 
preceding section. Then: 


Reve 15.1. The sampling distribution of b is a normal distribution with 
mean В [sce (15.3)] and variance 


i212..4N (15.4) 


Reve 15.la. The standard error of the sampling distribution of Rule 
15.1 is 


di уст (15.5) 


The standard error of b may be estimated by using бу. as given by 
(14.25) in place of ту. in (15.5). That is, 


gy = (15.6) 


Or, if we incorporate instructions for finding & i 3 
, : $ £ F,.2 as given by (14.18) 
and (14.20) we have pes yi 


„ _ Quy)? 


"NS Zo, 
9 (N—2)522, 
а (Zc2)(Zy2,) Ern) Ре 
or, ё y?, m " 
f И ү (N —2)(2x2,)2 ш) 


Formula (15.7) may һе used as a computing formula. For example, in 
ee шымы, of the high school counselor (see Table 14.3) Ez2, = 150.5, 
Z^; = 27.8848, Zr;y; = 42.76 and N= 50. Hence, application of (15.7) 


gives ə 


‚= „ [050.5)(27.8818) — 2:787 2 
(50 — 2)(150.5)2 nias 


m ; 
The estimated standard error of the sam 


by (15.6) or its equivalent (15.7) is pling distribution of b as given 


appropriate for use in computing the 


446 


SAMPLING THE = зв 
IPLING THEORY FOR REGRESSION AND CORRELATION 


t-statistic as defined by (12.1). That is, if B is the population value of the 
slope of the prediction (regression) line, then 


В, ووت‎ (15.8) 
Ch 

To see that the number of degrees of freedom for Fa and consequently 

for the t of (15.8) is N — 2, refer to (15.6). Note that оь as given by (15.6) 
is a function of бу. Which in turn is based on the deviations of the V 
Y-scores from the sample prediction line. The measurement of these devia- 
tions, of course, required the placement of this line, that is, the deter- 
Mination of its slope b and intercept c as auxiliary values derived from the 
шыны Hence, by the rule for determining df (see page 341) we have 
й=у— ` 
A (15.8) is of e considerable practical importance since it indicates 
that ( may be used as a test statistic in testing any specific hypothesis 
about the value of B, or in establishing an interval estimate of B. Before 
illustrating such applications of (15.8) we shall consider the sampling 


distribution of Y. 
Again we hav 

from a set of subpopulations of the 

described in Section 15.1. Then: 


e given an infinity of sample experience pools selected 
type described and in the manner 


Rene 15.2 The sampling distribution of Y is a normal distribution with 


mean дү fie (15.3)] and variance 
(15.9) 


Russ 152a. The standard. crror of the sampling distribution of Rule 


13.2 is 
С.т (15.10) 
су = VN 
The standard error of Y may 
(14.25) in place of ту. in (15-10). 
2 (15.11) 


be estimated by using бу. as given by 
That is, 


Or if we incorporate instruetions for finding Gy.2 as given by (14.18) 


and (14.20) we have 


eu 
\ 


or 


3 


3775) 2) — (Eran? 
ee asas) 


"ORRELATION 447 


SAMPLING THEORY FOR REGRESSION AND C 


Formula (15.12) may be used as a computing formula. For example, 
using the data of the problem of the high school counselor (see Table 14.3) 
we have 


(150.5)(27.8848) — (42.76)? 
(50)(50 — 2)(150.5) 


a= = 081 
As was true of the estimated standard error of b, this estimated stand- 
ard error of F has МУ — 2 degrees of freedom and is appropriate for use in 
computing the t-statistic as defined by (12.1). That is, if wy is the popula- 
tion value of the J-intercept of the prediction line, then 
eS, ورطق‎ (15.13) 


Р 
This ¢ may be used in testing any hypothesis about the value of ur, 
or in establishing an interval estimate of шу. 


15.3 Testinc HYPOTHESES ABOUT 8 AND uy: EXAMPLES 


We shall use the situation and data of the problem of the high school 
counselor to illustrate tests of statistical hypotheses about the values of 
B and wy. First we shall consider 8. This parameter is actually the more 
important of the two because of the effect of its magnitude upon the accu- 
racy of the prediction process. From (14.8) we see that 


$ 
desig (15.14) 
Y 
or, for the total population 
р= В 5 (13.15) 


That is, the population correlation, p (rho), isin part a function of the 
tion slope, В. Now we have previously learned how the сотто] 
may be interpreted as an index of the accuracy of the 
(see Section 14.7 and Figure 14.8). If B has the 
and information about X is of no assistance w 
value of Y. Consequently, 


popula- 
ation coefficient 
prediction process 
value zero, then p is zero, 
hatever in predicting the 
a test of the hypothesis that 8 = 0 is of con- 
siderable interest, for unless this hypothesis ean be rejected the use of the 
prediction equation is completely fruitless. 


! * Hence, we shall show how the 
high school counselor would test the hypothesis that 8 = 0, 


Sree 1. H: В = 0; alternatire B»0 


Comment. It is not plausible that the scholastic aptitude test correlate 
negatively with college achievement. Consequently B cannot be negativet 


*The test of the hypothesis tha =0i £ е Я 
it hs 4 the hypothesis that 8 = 0 is often referred to as a test of the significance 


{The signs of p and В must be the same since тү and Ty are positive [see (15.15)]. 


448 


SAMPLING THEORY FoR REGRESSION AND CORRELATION 


and the counselor appropriately tests the hypothesis against the single 


alternative cited, 


STEPS, wl 

Comment. ‘The counselor reasons that it would be far more costly to 
misadvise on the basis of a worthless prediction equation (i.e, to make a 
Type I error) than not to use an equation capable of improving to some 
to make a Type II error). 


extent, at least, the soundness of his advice (i.e.. 
For this reason he elects a small level of significance. 


Ѕтер 3. R: t= 242 
Comment. The counselor's experience pool contained 50 pairs of 
scores (е. N = 50). Hence, the number of degrees of freedom for the 
Lost. statistic involved is 48 (ie, № — 2). Table VI, Appendix C does 
not include data for the distribution of t for df=48. We have previously 
intimated (see page 342) that when df > 30 the t-values may be interpreted 
as unit normal deviates (i6. ах zvalues) In this example, however, we 
I-distribution for df = 40, that is, that 
largest df which is smaller than the actual 


have had the counselor use the 
tabled distribution having the aller th à 
sing a level of significance slightly 


df involved. ‘This actually amounts (0 u 
smaller than .01. 


Step 4. Calculation of t for data at hand. 
(see page 417) 


b = + .284 
& E aa (see page 446) 
284 — 0 . туа 


+ t= pir 


тыр 5. Reject. (Why?) 

Comment. Rejection of the hypothesis 8 = 0 implies acceptance of 
the only alternative B > 0. This means that some improvement in accuracy 
of prediction may Бе expected through use of the prediction equation. 
py. In most practical work a relevant 
is not likely to exist. We have, in fact, 
ion of Y more because of 


We next consider the parameter 
hypothesis regarding the value of ur t lik 
included a description of the sampling на dela dd 
its importance in other situations to be considered later than ai ns ы its 
Usefulness in testing hypotheses about ur. To вол е our pamp e we 
shall, nevertheless, have the counselor test the hypothesis that, for the total 
Population of individuals whose X-scores are like those for the experience 
Pool at hand, the mean freshman-y¢cé 
(е.а C average). His purpose here may 
determining whether or not the average | 
achievement for the population from which his experience pool may be 


w college grade-point average is 2.0 
be regarded as simply one of 
level of freshman-year college 


D CORRELATION 449 


SAMPLING THEORY FOR REGRESSION AN 


regarded as having been selected is on a par with that which is generally 
thought of as "average" by college administrators. 
Srer 1. H: wy = 2.0; alternatives: uy <2.0; uy > 2.0. 


Comment. The counselor has no reason to believe that the actual 
value of uy cannot fall below as well as above the value hypothesized for it. 
Hence, both alternatives must be taken into account by the test. 


Sree 2. a=.05 


Comment. No particularly eritical decisions hinge on the outcome 
of this test and there is not much choice between the respective conse- 
quences of the two types of error. The counselor has selected a sort of 
neutral or compromise level of significance. 


Step 3. R: i= — 2.02 and t= + 2.02. 


Comment. As in the previous example the counselor used the (-dis- 
tribution for df = 40. Here a two-ended R is necessary to detect possible 
falsity of H in either direction. 


SmEP 4. Calculation of t for data at hand. 


Y= 2048 (see Table 14.3) 
‚081 (see page 448) 
ООО ea 
t= SMR ae =+ 3.06 


STEP 5. Reject. (Why?) 

Comment. This decision also implies rejection of the alternative 
иу <2.0. (Why?) Hence, the only remaining possibility is that uy > 2.0. 
This indicates that on the average the population with which the counselor 
deals (i.e., from which his experience pool was taken) performs at a some- 
what higher level than is usually thought of as “average” for college fresh- 
men by college administrators. Ў > 


154 ТеѕтІХС THE Hypornesis THAT р = 0 


5 15 clear from (15.15) that to test the hypothesis that В = 0 is the 
ani of testing the hypothesis that p = 0, that is, the hypothesis that 
ih * ee there is no relationship between the X- and Y-variates. 

5 hypothesis is often of interest in studies of relationship that do not 


involve prediction. For such situati it i 1 
А ` : situations it is possibl 5.8) i is 
convenient form, involving the st. eda Pe 


atistic E Баа 
B — 0, then (15.8) is simply stie r rather than the statistic b. If 


b 
E HU (15.16) 


450 


SAMPLIN 
PLING THEORY FOR REGRESSION AND CORRELATION 


But b=r Se [see (14.8)] 


and = = حو‎ [see (15.6) and (6.5)] 


[see (14.25)] 


N 
"ever G = چ‎ 
However, 8,.,— A gy. 


= aryl [see (14.23)] 


Hence, 


Therefore, 


or (15.17) 


This formula (15.17) is clearly more convenient for Mri Se hype 
thesis p = 0 in studies of relationship 5n e a prediction, for in such 
studies r rather than b is the more Use ul statistic. — | : 

By Aes i illustration we shall use the data of bs та а Pix 2 
hypothesis that the correlation between scores ona we ing аа 
test апа height in centimeters is zero for a popu a ne sae orm 
pupils. We shall use a two-ended test sensitive to in he | Ce 
that p <0 or that p> 0. Rather than attempt = eve cn p | : a 
situation which would dictate the choice of a sma i he с B 
significance, we shall simply arbitrarily. adopt .05 as xi qo us ig 
these conditions and using the {distribution for um d ж с dj 
= 50 — 2-8, but this t-distribution is not tabled), the critical region 15 


Reta 2.02 and {= + 2.02 


For the data of Table jdi r= O1.* Hence, for the data at hand 
for the data ol 14 oF) 


is retai Why?) It is important that the 
ar E sis that p = 0 is retained. ( 

st ы e ppm fact that this outcome does not prove the 
hypothesis to be true, i.c, does not prove that p=0. [Sce the remarks 


T Я НТУ" s xercise. 
"The student should verify this result as an eX! 


- AND CORRELATION 451 
SAMPLING THEORY FOR REGRESSION AND COR 


regarding the decision (Step 5) in the case of Solution II of the Problem of 
the Principal and the Superintendent.] 


15.5 ESTABLISHING CONFIDENCE INTERVALS 
FOR Û AND дү 


The formulas for the limits of the 1007 per cent confidence interval for 
the population slope (regression coefficient) B can be written by application 
of (12.11). Неге the Sı is, of course, the particular value of the slope 
(bı) for the sample at hand, and Gs is F, as given by (15.6) or its equivalent 
(15.7). Using the latter form of & we have 


c> 91.0. EY a fX 2 
B, B= F tue (Ex ao ae eu (15.18)* 
where dí2N—2 | 


Example. Using the data of the problem of the high school counselor 
(see Table 14.3), establish the 95 per cent confidence interval for the slope, 
B, of the prediction line for the population involved. 

Here N = 50 so that df = 48. As in the preceding sections we shall use 
the ¢-distribution for df=40. Now уо = lars. Hence, we enter the 
i-table of Appendix C, Page 516, in the column headed P = .500 — y/2 
= .025. Here we find that for df = 40, t= 2.02. Since for the data at hand 
= 4.284, Z2; = 150.5, Ey?; = 27.8848, and Хуу: = 42.76, application 
of (15.18) gives: 


| the 3) ss (150.5) (27.8848) — (42.76)? _ ^. ws 
8,8 + 284 (202) 4] (80—2)01505) ^ 559 
Similarly, by application of (12.11), we can write formulas for the 
limits of the 100» per cent confidence interval for the 
By, of the Y-intercept of the prediction line. Here the S; is the value of the 
Y-intereept (Y1) for the experience pool at hand, and ê4 is Gy as given by 
(15.11) or its equivalent (15.12). Using the latter form of ср we have 


T UNES Dr?) (Zy?) — (cy)? = 
Шү, Ay = Y E typ vi - NU PES 1. (15.19) 
where (= М9 МР Бе 


Example. Using the data of the problem of the 
(see Table 14.3), establish the 


population value, 


high school counselor 
E | П the 95 per cent confidence interval for the Y- 
in Es Vm - His prediction line for the population involved. 
ere Y1— 2.248; i i i 

um 1 -248 and all other values including that for ty 2 are as in the 
preceding example. Hence, application of (15.19) gives 
аьр 
*To conserve space we 
double minus or plus si 
the minus sign and th 
This form will be usec 


bs orden da two formulas involved in one line by using the 

е par ge шан В is given by application of the formula with 

dee i y applieation of the formula with the plus sign- 
ng all subsequent formulas for interval limits. 


452 


SAMPLING THEORY | 
PLING THEORY FOR REGRESSION AND CORRELATION 


„жа. oes URSI PROD 50.5) (27.8848) — (42.76)? _ ó 
Hy, By = 2.248 F (2.02) g (50)(50 — 2)(150.5) 2.084, 2.412 


15.6 Tue SAMPLING DISTRIBUTION oF У 


As indicated in (15.3), шу. г is the symbol we have used to represent 
the mean Y-score. for a subpopulation of individuals making the same 
X-seores Its estimator, Ў is the point on the prediction line corresponding 
to the particular value of X involved. If this value of X is expressed as a 
deviation from X, then (15.2) gives the estimated value of шу. =. That is, 


Ay. Y bed Y, = س‎ (15.20) 


Now the variance of the sampling distribution of Y for a given value 
of x has been shown to be the sum of the variances of the br and Y sampling 
are concerned only with a particular subpopulation, 


distributions, Since we 
Hence, by (6.17) it follows that 


the value of x is a constant. 


oy, = 1 b 
Therefore, Op = 0707 Tog 
at for the situation or model described in 
ition of Y-values for a given subpopulation 
istribution with a mean corresponding to 
These facts are summarized in the 


It has also been shown th 
Section 15. 1, the sampling distribu 
(i.e., for a given 2) is a normal d 
that of the subpopulation (i.e. Biz): 
following rule: 


Rene 15.3. Given an infinity of sam ple experience pools selected from a 

St of subpopulations of the type described and in the manner described in 

Section 15.1. Then for any particular subpopulation the sampling distribution 
Of Y is a normal distribution with mean Шу. = and variance 

с? = 120° + 077 (15.21) 


where x= X — X, with X representing the particular X-score made by all 


members of this subpopulation. 


Ree 15.3a. The standard error of the sampling distribution of Rule 15.3 
is 8а. 
(15.22) 


estimated by putting бъ as given by (15.6) 


This standard error may be : 
ace of ть and op. That is 


and Gs as given by (15.11) in pl 


20 ys z 
ъ= oP 
zx? 1 
ог Gp = yz STN (15.23) 


RRELATION 453 


SAMPLING THEORY FOR REGRESSION AND CO 


Or if we incorporate instructions for finding ¢,., as given by (14.18) 
and (14.20) we have 


N—2 


(Zr?) (Èy) — || д 
e (N — 2)Xx?; Er 


The estimated standard error of the sampling distribution of Y for a 


Qe 


or 


| (15.24) 


given q-value as given by (15.23) or its equivalent. (15.24) is appropriate 
for use in computing the t-statistic as defined by (12.1). This estimate, 
like & and с р, is a function of у. and hence has N — 2 degrees of freedom 
[see remarks following (15.8)]. 

In most practical situations, no relevant hypothesis regarding My.z 
exists. Instead the principal concern is with the accuracy with which 
Шу: is estimated. As will be shown in the following section, the foregoing 
sampling theory makes it possible to establish confidence intervals for the 
various subpopulation џ,. ,-values. 

It is important to note that & depends upon the X-score made by the 
individuals of the subpopulation involved. This is reflected in (15.24) 
by the presence of the z-value. Inspection of (15.24) shows that бр is 
least for that subpopulation of individuals whose X-scores are at the X- 
mean, for in this special case x = X — X = 0, and 8s takes the same value 
as бу [see (15.12)]. The further the subpopulation X-score from -Y, the 
larger ср. From this it follows that the use of the prediction line to estimate 
My-2-Values for extreme subpopulations is much less accurate than for 
subpopulations which are more centrally located with reference to the 
X-scale. In general, it is not wise to use prediction lines to estimate Шу. 


values for extreme subpopulations—particularly if these subpopulations 
are not well represented in the experience pool. 


15.7 CONFIDENCE INTERVALS FoR THE My-2-VALUES 

al ا‎ torthe limits of the 100y per cent confidence intervals for the 

(21). d^ Subpopulation Y-means can be written by application of 
uisum Gro ee His particular value of the estimated sub- 


population mean (1) for the experi Gs is б 
: s м xperience pool at he : Bun 
given by (15.23) or its equivalent (1524). [Ml feel aae 


we have . Using the latter form of бр 
sas Йу = f 2x2, 2)— 2 E 
Bure Haaa = Yi F tye NIE E Wess Gry) Е - Y (15.25) 
where df=N~2 2 g 2° N 


454 


SAMPLIN E > 
NG THEORY FOR REGRESSION AND CORRELATION 


Example. Using the data of the problem of the high school counselor 
(see Table 14.3), establish the 95 per cent confidence interval for the 
Y-mean (u,.,) of the subpopulation of individuals whose X-scores are 9. 

Here У = 50 so that df = 48. However, as in preceding examples we 
shall use the édistribution for df= 40. Since (4:3 = t4753 we enter the 
t-table of Appendix C, page 516, in the column headed P = .500 — y/2 
= .025. Here we find that for df = 40, t = 2.02. Moreover, since X = 10.1, 
we have for the data at hand 

ү, = (+ -284)(9 — 10.1) + 2:248 = 1.936 

Also X3, = 150.5, Sy?) = 27.8848, and riy; = 42.76. 


Hence application of (15.25) gives 

Husz Шуе = 1.936 : 
s (150.5) 27.8848) — (42.76)? |[(9 — 10.1)? , 1 

TP) N| (50 — 2)(150.5) 150.5 50 


= 1.030 F (2.02)(.096) = 1.742, 2.130 


limits of the 95 per cent confidence intervals for 
the Y-means of all the subpopulations represented in the high school 
counsclor’s experience pool, that is, for X-values of 7, 8, 9, 10, 11, 12, 13 
and 14. Also included are the fly-2 and Zy.z-values for the hypothetical* 


Table 15.1 gives the 


TABLE 1 5 1 Limits gy. ; and Muz of the 95 Per Cent Confidence 
Б Intervals for the Y-means of the Subpopulations 


Tnvolved in the High School Counselor's Experience 
Pool of Table 14.3. 


a oF Bur 
т نب‎ oD 
7 166 1.038 
8 A27 1.395 
9 096 1.742 
10 OSIF _ evel 
101= Y DEL = 
11 091 2.320 
19 120 2.546 
59 2.753 
13 158 “у 2 
14 199 2.054 
s whose Y-scores have the value 10.1. Note 


Subpopulation of individua 
that these latter values are 
for uy and py. This follows 


=== 


identical with those previously determined 
from the fact that when Y= 10.1, x = 0, 


eu. К UE fe dm M‏ چ 

*Hy < chile the X-trait 1s assume d to be continuous, the 
y tien) i > sense at, while 

X pothetieal in the sense kid of measurements reported to the nearest whole num- 
8 consis as 


Cores are integers—i.¢. 
ber e integers Й 


SSION AND CORRELATION 45 5 


SAMPLING THEORY FOR REGRE 


Or, if we incorporate instructions for finding Fy. as given by (14.18) and 
(14.20) we have 


у= Zr*y(Zy*)— CFO E 1 590 
me NI Ga, Ee st 599 


We can now use the /-distribution for N — 2 degrees of freedom to 
establish the error which would be exceeded in cither direction 100(1 — Y) 
per cent of the time in the long run. Let e represent the error. Then 


i el. е—0_ е 


Hence, 
с= 10, df=N-2 


and the error which will be exceeded 100(1 — y) per cent of the time in 
either direction in the long run is 


C17 = 06,50, df=N—-2 (15.30) 


It is now possible to establish the limits of the 1007 per cent confidence 
interval for a prediction or regression estimate which is interpreted as an 
estimate of an individual score. Let F represent the actual value of the 
Y-seore of an individual selected at random from the subpopulation 
involved. Then the limits of the 100*y per cent confidence interval for X 
are given by 

Y, T= Ў bse (15.31) 
where df = N — 2 and c, is as given by (15.29). 


Example. Using the data of the high school counselor's experience 
pool, establish the 95 per cent confidence interval for the Y-score (27) of 
an individual whose X-score is 9. 

Here N = 50 so that df = 48. However, as in preceding examples, We 
shall use the t-distribution for df=40. Since ty2=t475, we enter the 
t-table, Appendix C, page 516, in the column headed P = 500 — y/2 = .025. 
Here we find that for df = 40, t= 2.02. Since X = 10.1, we have for the 
data at hand | 


Ү = (+ .284)(9 — 10.1) + 2.248 = 1.936 
Also for the data at hand 


a= [05020276818 02а (9101): 1 
(50 — 2001505) || 150.5 +1] =-581 


Hence, application of (15.31) gives 


1, 7 = 1.936 (2.02)(.581) = 0.762, 3.110 


458 


SAMPLING THEORY For REGRESSION AND CORRELATION 


Comment. This result will undoubtedly give rise to some amusement 
on the part of the discerning student. For an individual whose aptitude 
(X) score is somewhat below average [(9 — 10.1)/1.73 = — .64 standard 
deviations] we have, using a confidence coefficient of .95, predicted a college 
freshman year grade-point average anywhere from roughly 0.8 (less than a 
D average) to 3.1 (slightly better than a B average). Such a prediction 
scarcely seems to be of sufficient accuracy to be of much value. This result 
further emphasizes the importance of basing predictions not only upon 
large experience pools but also, and what is more important, upon informa- 
tion about a variable that is closely related to the variable to be predicted. 
The closer this relationship, the greater the reduction in бу. and also in 
б.) and the smaller бү, the narrower the confidence interval—that is, the 
more accurate the prediction of the individual Y-score. While predictions 
based on low relationships ought always to be avoided if possible, the fact 
remains that, if they cannot be avoided, it is better to take knowledge of 
the predictor variable into account than not to consider it at all. 

Of course, if we are willing to use а smaller confidence coefficient, the 
interval may be narrowed. It is not uncommon in making individual pre- 
dictions to determine the error that would be exceeded 50 per cent of the 
time in the long run.* ‘This amounts to using y = .5. That is, the long-run 
Probability that intervals determined by using (15.31) with y/2 = .25 will 
contain the actual individual Y-score is one-half. In our example, the value 
of £ for y 2= 25 is .08. Hence, the limits of the 50 per cent confidence 


interval would be 
y F= 1.936 + (.68)(.581) = 1541, 2.331 


15.9 Cross-VALIDATION 
attention to the accuracy with which 


estimates of subpopulation means тау be used as estimates of the Y-scores 
of individual members of the subpopulations. In Section 14.7 we saw how 
the correlation between X and Y provides an index of the accuracy of pre- 
dictions so interpreted, being, in fact, the same in value as the correlation 
between the actual and predicted Y-scores of the individuals in the sample. 
In Seetion 14.8 we saw how the standard error of estimate for the sample 
also serves as such an index, being the square root of the щй of the 
Squares of the differences between actual and predicted s alues, and, 
ence, a sort of average of the errors involved in individual oe 
_ These interpretations, however, refer to the шо of the predic- 
tion equation to the members of the sample at hand. Since the Y-scores 
Or these individuals are known to us; we could not possibly have any prac- 


кш یی‎ N 


We have given considerable 


ms 
This error is referred to 28 the probable error. 


ND CORRELATION 459 


SAMPLING THEORY FOR REGRESSION А 


tical interest in “predicting” them. Our real concern has to do d the 
accuracy with which the equation can be applied to individuals whose 
- are unknown. 

mr Anni vi section we refined the determination of the standard 
error of estimate as an index of accuracy of individual estimates to take into 
account sampling fluctuations in b and Y, and used it to make an interval 
estimate of an individual's Y-seore. However, this presupposed the con- 
ditions of the regression model. 'The collection of individuals whose 
Y-scores may need to be predicted may not have been selected in the man- 
ner prescribed by this model. Some of them may even be from subpopula- 
tions not specifically represented in the sample. It is for such collections of 
individuals that we may wish to assess the accuracy of a particular re- 
gression equation. | 

While the correlation between actual and predicted Y-scores serves as 
an index of accuracy of individual predictions in the more or less trivial 
case of the application of the prediction equation to the particular indi- 
viduals furnishing the data for its derivation, this correlation cannot be 
used to assess the accuracy with which this same equation can be applied 
to a different group. For such a different group the correlation between 
actual and predicted Y-scores will always be the same as between the X- 
and Y-scores for that group regardless of the appropriateness of the equa- 
tion so used. This follows directly from Rules 13.2 and 13.3. Hence, to 
assess the accuracy with which a regression equation derived from one 
group may be applied to the individuals of another which may be so con- 
stituted that the conditions of the regression model are not satisfied, it 15 
necessary to use an index based on actual errors of estimate (ie, Y- ?} 
differences). A convenient index is a value analogous to the standard error 
of estimate, namely, the square root of the mean of the squares of these 
actual error values for the members of the second group. This index will 
be larger than the standard error of estimate for this second group, which, 
of course, is a value based on the application of the equation which is 
optimal. It will indicate the agreement (or, rather, disagreement) between 
actual scores and scores obtained by application of a prediction equation 
derived from data for different individuals. To obtain the value of this 


index for a group of individuals other than those comprising the original 
sample we may proceed as follows: 


1. Select а new sample at random from the popul 


the predictions are to be made. 
. Obtain the X-score for each ample. 
. Using these X-scores and the prediction equation derived from the original 
experience pool, obtain a predicted Y-score for each member of this new 
sample. 


4. Obtain the actual Y-score for с: 


ation of individuals for which 


to 


member of this new sa 


w 


ich member of this new sample, 


460 


SAMPLING THEORY FOR REGRESSION AND CORRELATION 


5. Obtain the square root of the mean of the squares of the differences between 
actual and predicted Y-scores for this new sample. 


The index obtained in Step 5 reflects the accuracy (or, really, inac- 
curacy) with which the prediction or regression equation derived from the 
original experience pool may be applied to new or different individuals who 
may not have been selected in accordance with the requirements of the 
This process is referred to as a cross-validation of a pre- 
diction or regression equation, and the index obtained in Step 5 is a cross- 
validation standard error of estimate. A comparison of the value of this 
index with that of the actual standard error of estimate for the cross- 
Validation sample provides some indication of the cost in accuracy of the 
application of a somewhat less than optimal prediction equation to indi- 
viduals who were not members of the original experience pool. 

The practical difficulty usually encountered insofar as effecting a 
cross-validation is concerned is the unavailability of à new sample of indi- 
viduals whose actual Y-scores are known, for usually all such individuals 
are included in the original experience pool. One thing that can always be 
done to cireumvent this difficulty is to subdivide at random the original 
experience pool, using one part to determine the regression equation and 


the other to effect the cross-validation. 


regression model. 


15.10 Tug NORMAL Bivariate MODEL 


The regression model described in Section 15.1 is the m 
When the primary problem at hand is that or aimed = а е ( ), 
Biven information about another (X). In this mode ie ag shee ie 
Chosen that the X-scores are the same for all samples. i hat » he aea es 
Consist of subsamples of Y-values selected at random wie “ poe ei 
Of such values, each such subpopulation being characters’ by the fact 

, ated with all its members. 


that th i X-score value is assoc! 
| the same particular A-S¢ А didt 
Toreover x subsample involves the same number of cases from one 


repetition of the sampling procedure 9 a sy илеу oon 

Tandom variation in the Х-всоге values, and t dien ^ Pen E y 

the effects of random variations only in the ле he TM" 

While this model is appropriate for the a о ыз 

1t is not appropriate for the situation 1 ius" eis te is opis jg 

e population relationship between pars a v^ | бе ine nis ШО 
the pairs of X- and Y-values are selected at randon p 


variati ly to the X- as well 
Of such pai fects of random variation appl) 
pairs, and the effect bog FEN _ 
ho, oe а Consequently, а different model is needed. An ap 
Propriat ы p own as the normal bivariate model. Although this 
Model sii "de s cribed in precise mathematical terms, we shall attempt 
ay be presc 


Only a verbal description. 


ATION 4 61 


; AND CORREL 
SAMPLING THEORY FOR REGRESSION AND © 


The bivariate frequency distribution was defined in Section 13.3. 
Consider such a distribution in the case of an infinite population of indi- 
viduals for each of whom there exists a pair of scores, say X and Y. Now 
suppose that the X-scores considered alone are normally distributed for 
the population and suppose the same to be true of the Y-scores. Further 
suppose that the classes or intervals of both X- and Y-scales in this bi- 
variate frequency distribution are infinitesimal. Then the bivariate table 
(for examples of such tables, see Tables 13.7 and 13.8) would contain an 
infinity of cells formed by the intersections of the vertical columns extend- 
ing from the X-scale intervals and the horizontal rows extending from the 
Y-scale intervals. Corresponding to cach of these cells there is some joint 
relative frequency—that is, the proportion of individuals whose X- and 
Y-scores fall simultaneously or jointly in the X and Y intervals which are 
associated with the particular cell. Now if for any vertical, horizontal, or 
diagonal array of cells, these joint relative frequencies are those of a normal 
distribution, the population is said to be a normal bivariate population. 
An investigation of the sampling distribution of values of a statistic such as 
7, which values are derived from samples selected at random from a normal 
bivariate population, is said to involve the normal bivariate model. 


15.11 FISHER’S LOGARITHMIC TRANSFORMATION OF Т 
If for a normal bivariate population or model the value of the popula- 
tion correlation (р) is in the neighborhood of zero, and if the number of 
pairs in a random sample from such a Population is large, the sampling 
distribution of the sample correlation (r) tends toward a ЖОЙАТ distribu- 
tion. However, if the population correlation differs from zero the sampling 
distribution departs from normality in form unless the oni ale is extremely 
large. This departure becomes more and more marked p the valià of p 
approaches plus or minus one. That such is the case is емі — to be 
intuitively reasonable when one considers that the и ыы a lue any Д 
сап assume is unity and that if p is large, say + .80 or 4 ad im s validus 
have much more room to vary to the lower side of p than tö th higher die: 
In such situations, therefore, the sampling distribution of r mil im negó" 
e Й er hand, if p is in the neighborhood of — .80 or 
Е .90, па ling dis n ution of r will be positively skew is near 
ө ا‎ e distribution will tend to be ee : samples 
vill still not be normal for small samples. Cons po heses 
about p— particularly hypotheses other than p= ee ee gei 
garding the equality of the p-values of two populati ba € d 
then normal-distribution sampling theory is ү E ern yd em d 
Y 1S not appropriate; to test such 


hypotheses, special methods mus 
we 1st be employed if reliable results are to Þe 


462 


SAMPLING THEORY р 
THEORY Fon REGRESSION AND CORRELATION 


In 1915 R. A. Fisher* introduced a new statistie which was a function 
of r. Suppose we have given an infinity of r-values each based on a random 
sample of V pairs of values selected from a normal bivariate population 
for which the correlation coefficient has the value р. These r-values, of 
course, constitute the sampling distribution of r for random samples of N 
cases from this population. Fisher showed that if each of these r-values 
were transformed into this new statistic, the resulting values would be 
approximately normally distributed, with a mean corresponding to the 
transformed value of p and variance of 1/(N — 3). He demonstrated that 
this would be true regardless of the value of p even for samples that are 
quite small. Since this new statistic is a function of, or a transformation 
of, a value on the rescale, it is possible, by expressing any p- or r-value in 
terms of this new statistic, to use its sampling distribution indirectly to 


test hypotheses about p. | . 

Fisher called this new statistie z, but we shall designate it by 2, to 
prevent confusion with previous meanings we have associated with z. This 
statistic is defined as follows: 


I+r 


1 
= g 105.77 


The z-value corresponding to any 7-value may be obtained by com- 
puting (1 4 7), (1 — r) and by using а table of natural logarithms tie fnd 
one-half the logarithm of the result of this computation. It is NEA A 
however, for the student to master the use ofa table of natural logarithms 
in order to effect this transformation. Values of 2, юе be 
of r from .000 to .995 by increments of 005 have еш Ermine 
given in Table ҮП, Appendix C. To find the pre P ing to 
any of these r-values, it is necessary only to refer to this table. 

The foregoing theory may be 


summarized in the form of a rule. 


Кере 15.4. For random samples of N from a normal bivariate popula- 
ar D " Ө ш : > u in 

1 i "7i T1 3 T n Ti РА 
lion for which the correlation between the variables is p, the sampling distribu 


lion of 


„ез р EE (15.32) 
E ] 
is E ‘stribution with mean 
а normal distribution with m Te" — 


^p 2 loge t= p 

and with tari | 

variance ا‎ 
o:,=V—3 

icati Duher's z, Statistic to tests of 

| application of Fisher's zr 5 

oefficients and to the determina-‏ ا 


Examples illustratit 
ae es illustra 1 
н ation correlation ¢ 


hypotheses about popul 

EA = -——— € 

“R.A е mee pistribution of the Values of wo veo Sx pe 

Sirius an 1 е ИШЕ ae Population,” Biometrika, Vol. 10 ( a Ip. QUI AS 
m an Iudefi pr SESS 


AND CORRELATION 46 3 


SAMPLING THEORY FOR REGRESSION 


tion of interval estimates for such coefficients are given in the following 
sections. 


s ABOUT A POPULATION 
б 


15.12 TESTING А Non-zero* Hyrorm 
CORRELATION COEFFIC 


Suppose that according to a certain theory the relationship between 
two variables (X and Y) for a specified normal bivariate population is .90. 
To test this theory, a researcher selects from this population a random 
sample of 75 cases, and obtains X- and Y-scores for cach. He finds the 
correlation for this sample to be .94. The theory may now be tested 
statistically as follows: 

Srep 1. H: p=.9; allernatives: p < Э and p> 9 

Comment. For purposes of this example we have assumed that the 
actual value of the population correlation may conceivably differ in either 
direction from the value hypothesized. 

Sree 2. a= .05 


7 TPs > Li ous 4 b " Aa 
Comment. This simply represents an arbitrary selection for illustrative 
purposes. 


Srep 3. R: 2s — 1.90 and z= +1.96; or |z| = 1.96 


Comment. The Fisher transformation of r is normally distributed, 
hence, the unit normal deviate, z, may be used as the test statistic. 


Sree 4. Calculation of 2 for data at hand. 


To compute the value of the test statistie, we proceed as follows: From 
Table VII, Appendix С: 


and 


Also i= Р te 


Hence, applying the sampling theory summarized in Rule 15.4, we have 


c Дый 1.472 F266 
MI Eu 118 116 


=+ 2.25 


Step 5. Decision. Reject H. (Why?) 


^ 1 . э 
к isher's z, statistice may be used quite satisfactorily to test zero hypotheses 
a out population cor Тотонов Lo. t0 west hypotheses that p= 0. However, 
Шш Section 15.4 we have already presented an exact test of this hypothesis a test which 
> S applied. Hence, we shall illustrate the use of z, only in connection with 
ests of hy ‚жез str 
f hypotheses about some value of p other than zero. 


464 


SAMPLING THEORY FOR REGRESSION AND CORRELATION 


М Comment. Note that this decision also implies rejection of the alterna- 
ive p <.90. (Why?) Hence, the only remaining possibility is that p > .90. 


15.13 ESTABLISHING A CONFIDENCE INTERVAL FOR р 


p ах at all common to encounter situations in which a relevant 
af » tical non-zero value of p exists. In practice, then, we shall find little 
nd or the test illustrated in the foregoing section. What is more likely to 
зе called for is an estimate of the value of the population correlation. In 
this section we shall illustrate the use of the Fisher transformation to 
establish an interval estimate of p. 
үч Since for a normal bivariate population with correlation p the statistic 
е4 is approximately normally distributed with mean 2р and standard error 
/VN — 3, it is possible to apply the reasoning of Section 11.4 to the 
Problem of approximating а 1007 per cent confidence interval for the 
Population value of zp (1.е., the value of the Fisher z corresponding to p). 


T ; 
he formulas for these limits are: i 


2р, = Zr am VN3 
p mam 


ossible to determine p and p by 
2p and 25 values and by reading 


(15.35) 


Эи Zo and гр are determined it is p 
с ANE Table VII, Appendix С, with the 
Corresponding values of r. 

5 uw illustrate we shall estas 
en r= .94 for a sample of 79. 


blish the 95 per cent confidence interval for 
-5 = 1.90 and 


Here 2/2 = 2.475 


or He‏ ا 
N—3 75—3‏ 8 


‘bag from Table УП, Appendix С, 294 = 1.738. Now applying (15.35) we 
Nave d y s 
Zp, £p = 1.738 F (1.96)(.118) = 1.8067, 1.9693 
Now in T ix C, the value of z, closest to 1.5007 is 
n Table VII, A endix С, the ¥ Г 
1.490177 апа he ane of ана to 1.9693 is 1.945906. These correspond 
d 7T-values of 905 and .960 respectively: Hence, the limits of the 95 per 
pr. Confidence interval for p are р Зар 960. n de ing al 
.evlous illustrations of confidence intervals the limits of t ER interv al are 
Merent distances {гой the sample 7- This result 18, of course, consistent 


Уу ош. 
Ith the skewness of the p-sampling distribution. 


= .YUo, 


gaT TWO NorMAL BIVARIATE 


ST E 
p-VALUE 


15.14 Test or THE HyPOTHESIS ds 
PoPULATIONS HavE THE AM 

her the degre 

t for an 


The . р t e of relationship between 
ne pai question often arises whe i other pair. This question 
air of variables is different from tha 


ay & S 
be encountered in a variety of ways: 


465 


SAN LATION 
MPLING THEORY FOR REGRESSION AND CORRELATIO? 


(1) It may involve the correlation between variates X and Y for one 
population as compared with that between these same variates for another 
population. For example, is the correlation between height and weight for 
fifteen-year-old boys the same as for adult males? Or, the correlation 
between performance on the Wechsler Intelligence Seale for Children 
(WISC) and performance on an arithmetic achievement test for sixth-grade 
boys the same as for sixth-grade girls? 

(2) It may involve the correlation between variates X and Y for one 
population as compared with that between variates А and B for either the 
same or different populations. For example, is the correlation between 
performance on a dental aptitude test and success as a dental student for 
a population of dental students the same as that between performance on 
an engineering aptitude test and success as an engineering student for a 
population of engineering students?* Or, is the correlation between pitch 
discrimination ability and tonal memory for a population of high-school- 
age girls the same as that between sense of time and rhythm for the same 
population? 


(3) It may involve the correlation between variates Y and Y for one 
population as compared with that between variates Y and A for either the 
same or another population. For example, is the correlation between college 
achievement and performance on a scholastic aptitude test for a population 
of college men equal to the correlation between college achievement and 
some measure of high school achievement for this same (or some different) 
population.f 

Even though the same individuals or objects are involved, cach of the 
foregoing situations can be viewed as pertaining to two bivariate popula- 
tions. For instance, in the last example cited the population objects are 
college men. Nevertheless, for these objects two bivariate populations of 
scores exist, namely, one involving measures of college achievement. and 
scholastic aptitude, and the other involving measures of college achieve- 
ment and high school achievement. Now if both of these Бш popu- 
lations can be reasonably assumed to be normal the hypothesis that the 
correlation (p) for one of them is of the same magnitude as that for the 
other can be tested by using the Fisher transformation. The procedure i5 
identical for all situations requiring only thal the sample enii be based on 
independent random samples of population objects. That is, in the vase of the 
a E college men, the random sample of such nat for ШЕН the 
correlation between college achievement a рајн EN sah 
must be selected independently ee cedi, жүрсүн ‘h pirat d 

sample of such men fo 


nounts to asking whether suc 
f % area can be predicted wi re or less 
S Ee pei can be predicted with more or lt 
amounts to asking whether success i 
E Б SS Ini р 
able than by another. i 


n one 


aven area is better predieted by one yari- 


466 


SAMPLING THEORY FOR REGRES 


SION AND CORRELATION 


which the correlation between college and high school achievement is 
obtained.* 

Since the test procedure is the same for all the situations described, so 
long as independent random samples are employed, a single illustrative 
example will suffice. Suppose that the correlation between 10 (as measured 
by the Revised Stanford Binet Scale) and performance on a particular group 
test of mental ability is found to be .56 for a random sample of 60 school 
children at the third-grade level. Suppose further that for a random sample 
of 40 school children at the fifth-grade level the correlation between these 
same variates is found to be .74. The question is, how do the population 
correlations compare, that is, are they equal in magnitude or is one or the 
other larger? This question may be answered by a test of a statistical 
1 fifth-grade populations as 1 and 2 


hypothesis. Designate the third- anc 
respectively. Then: 

Srep 1. Л: py — р> = 0; alternatives: ру = pa <0, and ру— p2> 0. 
Comment. Since either population may have the larger p-value, both 


of the above alternatives to H must be considered. 


Sree 2. a= .05 


arbitrarily sclected for purposes of illustra- 


Comment. This value was \ c | 
provide no basis for choosing a particular 


tion since the terms of the example 
Q-value. 
Srkp 3. riz — 1.96 and zz + 1.96 

Comment. We know from Rule 15.4 that if the two bivariate popula- 
tions are normal the Fisher transformations of 71 and rz will po 
mately normally distributed. Hence, by Rule e = EN ps pese 
Will be normally dis ributed anc al deviate Is, ‚ аррго- 


Priate as a test statistic. 


1 the unit norm: 


SrEp 4, Computation of z for the sample. 

nt under Step 5, it was pointed out that since 
it follows from Rule 9.6 that the 
Rule 9.6 further indicates that 
es will have a mean value of 2p, — 
test statistic z is given by 


Comment. In the commen! 
uly distributed, 
rmally distributed. 
f difference 
Hence, the 


& and z, are norm: 
difference z, — z, is norm 
1 


this sampling distribution 0 
And a variance of ©, T0 


Riz ec ia o enon Е ] 
2 E + that Pry = Pra Which use the 
*Pr В = esis that Pru = Pab OT a NEU : M 
бе ai к ШШЕ ws "E the sample r-values are available but are beyond 
Sample of individu“ S 
1% scope of this text. 


AND CORRELATION 4 6 7 


SAMPLING THEORY FOR REGRESSION 


APPENDIX. A 


LS 
GLOSSARY OF SYMBO 


in which they 
i ^ discussion in whic 
hich are more or less unique to the discu sin 
азаи i ў included i | »syimbo 
Syn od and defined are not include TT 
ap: " ‘ing the definition indicate the I 
arentheses follow 
paren 
is first used. 2 | 7 
^ y o is divided into threc 
bols NT English letter symbols, 
symbols, (2 


"s dn 
eq bers 1 
is glossary. The num 

n this gloss: Ne 


abetica 
parts as follows: (1) i The 
and (3) Greek letter — of the 

i i › nan x 
re arranged alphabetically according Е diia accordin£ 
( 1 abeticall; 
ке care arte in (2) and (3) are arranged alphabe 
The sy : 2 

ai wl alphabet involved, 
to ü 


] NON-ALPHABETICAL SYM BOLS 


, dif- 
Vid d ie th the a 
"Double minus and plus sign indicating Li indicates 
a ference and sum of two quantities, Nea 
both a — b and q +b. (452) 
- "Is approximately equal to.” (52) ЖОО 
Sy = “Greater than” and "less than” signs. Used vy auger 
j that one quantity is Breater than (or less a nil ру 
2E. qb indicates that the quantity we cw p, a < 
a is greater than the Quantity represented fa less tha” 
indicates that the quantity represented by a is 
the quantity represented by p. (265) it о 
v equal 
== “Greater than or equal to” and “less than QUA m 
Signs. A Combination of the “greater than’ rudi (= | 
or "less than” Sign (<) With the "equals und by 
Eg, а=} indicates that the quantity аа repre? 
a is either Breater than or equal to the quantity 
sented by p, (272) 
co "Infinity. ^ (184) 
У 


“Square roop» s 


Sign, Eg. Ма 
of the Quantity Teprese 


root 
are 
Indicates the squat 

hited hy a. (140) 


470 


18 
Q^ 
.ayMP 
GLOSSARY OF SY 


A 


А,В 


С 


cand d 


of 


Cı, Cs, ete. 
D 
D 


df 
Dı, Da, ete. 


€ 


"pde The tilde. placed above the symbol represent- 
ing a population parameter, indicates an estimate. of 
this parameter. E.g.. if a population standard deviation 
is represented by т, then @ represents an estimate of 
its value. (258) 


2. ENGLISH LETTER SYMBOLS 
Value of an arbitrary reference point on a score scale. 
(117۱ 
Magnitudes of constants. (3.90) 
Slope of a linear prediction (i.e. regression) equation 
for a real collection of data. (413) 
Number of classes in a frequency distribution. (90) 
Y-intercept of a linear prediction (ie. regression) 
equation for real a collection of data, (415) 
Magnitude of a constant. (72) 
Distances which, marked off below and above the popu- 
lation parameter on the scale of values of the statistic, 
establish. an interval such that the probability of the 
statistic in this interval is some arbitrarily selected 
value. (223) 


Cumulative frequency of a class in a frequeney distri- 
bution. (73) 

First centile, second centile, ete. (2.2) 
Magnitude of a constant. (714) 

Difference between a pair of scores. (375) 
Mean of a real collection of differences between pairs of 
scores. (317) 

Number of degrees of freedom of a statistic. (338) 
First decile, second decile, ete. (77) 

1 
Limiting value of (I FF as w approaches zero, ip. 
proximately 2.7183. Used as the base in the fiatural 
system of logarithms. (/80) $ 
Difference between an individual's actual Y-seore a | 
А ; s ; anc 

is predicted (i.e., regressed) Y-score ^ 

his predicted ( : score. Le., the error in 


a predicted score. (427) 


GL 
OSSARY or SYMBOLS 471 


EA 


E, E GH 
E(X) 


1736 


k 


log.N 


M 


M 


MD 


472 


Extreme Area. The probability of a value of a statistic 
being equal to or greater than a particular observed 
value if the hypothesis is true. (312) 


Magnitudes of constants. (397) 


Expected value of a score (X) selected at random from a 
collection of scores. (126) 


Magnitude of a class (interval) frequency in a frequency 
distribution. (14) 


Frequency of a cell in a bivariate frequency distribution. 
(392) 


Frequency of a column (fx) or row (fy) in a bivariate 
frequency table. (391) 


Sum of squares of the deviations of a given collection of 
points from the least-squares line of best fit. The 


deviations involved are measured along the Y-scale 
(axis). (416) 


A statistical hypothesis to be tested. (273) 


Used as subseripts to designate individuals in a collec- 
tion. (48, 50, 393) 


Size of an interval (el 


ass) in a frequency distribution. 
(2/1) 


Number of subgroups of scores in 


а collection organized 
into subgroups. (56) 


Value of the largest score in a collection. (136) 


Linearly transfor 


med value of an interval midpoint. 
(397 


Natural logarithm (base e 


) of the number represented 
by N. (463) 


Number of Scores avail 
of a group. (60) 


Over-all mean of a ге 
up of a number of su 


able for each individual member 


al collection of scores which is made 
beollections (subgroups). (109) 
Mean of a linearly transformed collection of scores. Le., 


the standard value adopted as the mean of a system of 
Standard scores, (163) 


Mean deviation. (138) 


GLOSSARY OF SYMBOLS 


Мал 
Мо 
Mẹ 


N, n 


P(w|U) 
Q 
Qi, Q», Qs 


T, txr 


ref 


1S or relf 


Goss 
LOSSARY op БҮ 


Median. (7/) 
Mode. (100) 


Mean of a complete real collection of predicted (i.e., 
regressed) scores. (421) 


Number of scores in a collection. (47) When a 
collection of scores is organized into subgroups, the 
number of scores in a subgroup is represented by n and 
the total number of scores in all subgroups by N. 
(26, 59) 

Normal distribution. (/99) 

Normal probability distribution. (198) 

Magnitude of a relative frequency. (22) 

Proportion of objects of a certain type in a collection. 
(252) 

Power of a test of a statistical hypothesis. (287) 
Percentile rank. (70) 

The th percentile. Ig, Ру = the 5th percentile. 
(70) 

Probability of a w-type object in a universe U. (193) 
Semi-interquartile range. (136) 

First, second, and third quartiles, respectively. 

(71) 

Product-moment correlation coefficient for a real 


collection of pairs of scores, i.e., pairs of Х- and Y- 
values. (874, 396) 


Range of a collection of scores. I.e., the difference be- 
tween the largest and smallest score values. (136) 
Critical region or region of rejection used in testing a 
statistical hypothesis. (272) 

Affixed as a subscript to the symbol for a statistic to a 
present a boundary point of a critical region. (271) 
Relative cumulative frequency of a class in a frequency 
distribution. (80) 


Relative frequency of a class in a frequency distribu- 
tion. (43) 


MBOLS 4 `7 3 


Ri 


s 


SSreg 


SSres 


ssr 


tye 


7/ 


474 


Lower boundary of a critical region located at the upper 
end of a sampling distribution. (299) 


Upper boundary of a critical region located at the lower 
end of a sampling distribution. (299) 

Standard deviation (8) and variance ($2) of a real 
collection of scores. (140, 139) 


Standard deviation ($$) and variance (е?) of a real 
collection of predicted (i.e., regressed) scores, (422, 421) 


132, 4 
Standard error of estimate (8,..) fora real collection of 
predicted (i.c., regressed) scores and the corresponding 
variance (82,.,). (430) 


Value of the smallest score in a collection. (136) 


Standard deviation of a linearly transformed collection 
of scores. Le, the standard value adopted as the 


standard deviation of a system of standard scores. 
(163) 


Any statistic. (2.39) 


Sum of squares of deviations of predicted (ie. re- 
gressed) Y-values from the over-all mean of the actual 
Y-values. (439) 


Sum of squares of deviations of actual Y-values from 
predicted (i.e., regressed) Y-values. (440) 


Sum of squares of deviations of actu 


al Y-values from 
the over-all mean of these valu 


ез. (439) 
A test statistie the ш 


unpling distribution of whieh is 
"Student's 


"distribution, (338) 


Standard score derived by 


г ап area transformation which 
results in 


a normally distributed set of T-values having 
a mean of 50 and a standard deviation of 10. (223) 


A distance on the seale of the distribution which, 
if marked off in one direction from 


1 center (zero). 
establishes а segment 


h of the tseale such that the 
probability of a L-value in this segment is one-half of 
an arbitrarily selected probability value represented 
by y. (358) 


Universe or population, (193) 


GLOSSARY OF SYMBOLS 


W 


D 


қо 


x 


N 


2/2 


Any standard score derived by application of an area 
transformation. (2/9) 


Magnitude of a raw score. Also the value of a class 
(interval) midpoint in a frequency distribution. (14) 
Algebraic (signed) deviation of a score from the mean 
(X) of the collection to which it belongs. (139) 


Absolute value of the deviation of a score from the 
mean of the collection to which it belongs. (138) 


Mean of a real collection of seores which are represented 
by X (or Y). (102) 


Ordinate of the normal curve in standard-score (z) 
form. (/82) 


Ordinate of a normal curve. (180) 


Actual or true value of the dependent variable (Y) for 
a given individual in a prediction (i.e., regression) 
situation. (498) 


Deviation of a predicted (i.¢., regressed) score from the 
over-all mean of the actual scores for a real collection of 
data. (420) 


Predicted (i.e., regressed) score. Actually an estimate 
of a subpopulation mean obtained by use of the least- 
squares line of best fit. (419) 


Standard score derived from a raw score by a linear 
transformation which fixes zero and unity as the mean 
and standard deviation of the transformed collection. 
(159) 


Used to designate a normally distributed variate having 
mean and variance of zero and unity. (/82) 


Standard score derived from a raw score by a linear 
transformation which fixes M (M #0) and S (S = 1) 
as the mean and standard deviation of the transformed 
collection. Le, M and S are arbitrarily selected 
standard values other than zero and unity. (/64 


A distance on the seale of the unit normal distribution 
(i.c, mean = 0, variance = 1) which, if marked off 
in one direction from center (zero), establishes a 
segment of the scale such that the probability of z 


GLOSSARY OF SYMBOLS 475 


2. 


2р 


in this segment is one-half of an arbitrarily selected 
probability value represented by y. (328) 


Fisher logarithmic transformation of a product-moment 
correlation coefficient for a sample 
bivariate population. (463) 


from a normal 


Fisher logarithmic transformation of the product- 
moment correlation coefficient for a normal bivariate 
population. (463) 


Predicted (i.e., regressed) score when the given pairs of 
score values are expressed in standard form (i.e. 
as z-scores). (420) 


3. GREEK LETTER SYMBOLS 


(Lower-case alpha) Level of significance used in testing 
a statistical hypothesis. (267) 


(Lower-case beta) Probability of a Type II error in test- 
ing a statistical hypothesis. (284) 


Slope of the population prediction (i.c., regression) 
line. (445) 


(Lower-case gamma) Confidence coefficient associated 


with an interval estimate. The proportion of intervals 


that contain the parameter in the theoretical universe 
of such intervals. (32/) 


(Upper-case delta) Difference between the means of 
two populations. (348) 


(Lower-case theta) Any population parameter. 
(239) 


Limits of an interval estimate of the population param- 
eter represented by б. E.g., u, д represent the limits 


of an interval estimate of the population mean 4 
(325) 


(Lower-case mu) Mean of a population or of a hypo- 
thetical collection of Scores. (102) 


2 of an interval estimate of a population mean #- 
327) 


Mean of a population of d 


ifferences between pairs of 
Scores. (254) 


GLOSSARY OF SYMBOLS 


ир Difference between the means of two populations 
(350) | 


и, Mean of the error distribution of predicted (i.e., re- 
gressed) scores for a given X-value. (458) im 


"m Overall Y-mean for all subpopulations in a regression 
situation. (449) 


pm Mean of Y-scores for a subpopulation of individuals 
making the same X-score. (445) 
£ (Lower-case xi) Median of a population. (249) 
т (Lower-case pi) Ratio of the c ‘ircumference of a circle to 
its diameter. Approximatly 3.1416. (180) 
p (Lower-case rho) Product-moment correlation coeffi- 
cient for a population. (448) 
0,0? (Lower-ease sigma) Standard deviation (0) and vari- 
ance (т?) of a population. (180) 
сь, с? Standard error (ть) and variance (o?) of the sampling 
distribution of the slope of a prediction (i.c., regression) 
line. (446) 
Tp, 07 p Standard deviation (c5) and variance (о?) of a 
population of differences between pairs of scores. 
(224) 
Ge, T% Standard error (c,) and variance (т?) of the error 


distribution of predicted (i.e., regressed) scores for а 
given X-value. (497) 
Standard error (Sman) and variance (а?а) of the sam- 


C ndn, Fr min 
pling distribution of the median. (250) 


с, 0? Standard error (с) and variance (¢7,) of the sampling 
distribution of a proportion. (252) 


Standard error (¢p,—p,) and variance (9? _р,) of the 
sampling distribution of differences between two pro- 


т 


2 
Pi= par Op,- p, 


portions. (256) 


2 ; A ane ит aan (qi а 
Og Standard error (т) and variance (о?у) of the sampling 
distribution of the mean. (248) 


2 


Standard error (Cg _ x,) and variance (02 5 ?, ) of the 
sampling distribution of differences between two means. 


(256) 


GLOSSARY OF SYMBOLS 477 


e. 


sr 


478 


2 


g? 


2, 


Standard error (ср) and variance (о? х) of the sampling 
distribution of the over-all Y-mean for all subpopula- 
tions in a prediction (i.e., regression) situation. (447) 


Standard error (ср) aud variance (т?) of the sampling 
distribution of a predicted (i.e., regressed) subpopula- 
tion mean. (423) 


Standard deviation (c,..) and variance (02,.,) of 
Y-scores for a subpopulation of individuals making 


the same X-score. Oy. is the population value of the 
standard error of estimate. (427) 


Standard error (т) and variance (т?) of the sampling 
distribution of the Fisher logarithmic transformation 
of the product-moment correlation coefficient for 
samples from a normal bivariate population. (464, 462) 


(Upper-case sigma) Summation sign. Indicates "the 
sum of." (48) 


(Lower-ease phi) Proportion of objects or individuals 
of a certain type in a population. (252) 


GLOSSARY OF SYMBOLS 


APPENDIX B 


SELECTED FORMULAS 
AND RULES 


SELECTED 

Page Formula 

50 = № 

52 Xx ~ Уу Х, 
j= i=l 

Я N 

52 È fX 2X 
j= i=l 

52 ZCX;-2CZX; 


53 


54 


56 
56 
59 


102 


108 


480 


EfX;- МУр,Х, 
ZX?) = NEpX?; 


k nj D 
Xj- 
‚> i=l i P 
k 
N-Yn 
7=1 
y.2Xi 
Х = ү 
2X;=NX 


APPENDIX B 


FORMULAS 
AND RULES 


Number 


(3.10) 


(3.17) 


(3.18) 


(3.19) 


(3.20) 


(3.25) 
(3.26) 


Ny nk 


uc > Xoit + p Хы (3.29) 


(3.33) 


(5.1) 


(5.2) 


SELECTED FORMULAS AND RULES 


Page Formula 


112 Mx,c- XC 

113 Mex=CX 

114 Мехър= CX +D 

115 Z(X;— X)2ZXr-20 

117 E| X, — A| is least when A = Mdn 
126 Е(Х) => pX;-X 


gal 


5 № — Со 
132 Мап = Ls0+ per UE 


Му cf. 
133 p.— pp te (ibe) 


(Uso — Lso) 


b Ti r т 
188 мр== T » de 2| X:i— XI 
139 82 = Zea, ы, 
140 sz Xr a mx 

N 

zx е 

Hi зара AU کچ‎ 
148 EX gEXxAS ZX5 44 
4 82 N = N N X 


SELECTED FORMULAS AND RULES 


Number 


(5.8) 
(5.9) 
(5.10), (5.12) 


(5.13) 


(5.14), (5.15) 


(6.7), (6.8) 


481 


Page Formula Number 


= r2 У AE > e's es 
144 82 ES ? is ‘) = 2— X? (6.9), (6.10) 
146 S^ corr. for grating ~ S minada — 5 (6.13) 
where h = size of interval 
147 82x, 0 — 82x (6.15) 
148 80x = C?8?y (6.17) 
149 Sex = Cy (6.18) 
" MENS АБУ = T 
159, 160 zi ۴ Xi MIO. (41), (7.2) 
162 z=0 (7.4) 
162 Ze; —0 (7.5) 
162 g=] (7.6) 
162 5°, = ү (7.7) 
Ep SX | 
164,165 Z;— as X;+ [a — A = 8:4 М (7.8), (7.9) 
i _ - an 
decr. (8.1) 
a ل‎ ас) 
М2т (8.3) 
а 
191 =, т=Х—ц (8.4) 
191 X=oz+p (8.5) 
191 Y=} (3.6) 
кел 
193 P(w | U) = ү (8.7) 
19] — P(w|U)+P0w|U)=1 i 


4 8 2 SELECTED FORMULAS AND RULES 


Formula 


P(w or z | U) = Р(% | U) + P(x | U) 


ау = а, 
C=C ——— (cz — с) 
е Par 


X-—LX 


E-S—Q 
E E 
ү 

= 

STAN 


мап = 1.25 сұ (for ND population only) 


Фа — Ф) 
N 


2 
C= 


= $i = $1) | dol = n 


ni пэ 


2 
9-р, 


1— ф1) 2(1 — de) 
REINES 


пә 
NeT s 
Misses ^ gi 


SELECTED FORMULAS AND RULES 


Number 
(8.9) 


(8.10) 


(8.11) 


(9.1) 
(9.2) 
(9.3) 
(9.7) 


(9.8) 


(9.9) 


(9.10) 


(9.11) 
(9.12) 
(9.13) 
(9.14) 


(9.15) 


(9.16) 


483 


260 


260 


260 


260 


261 


261 


261 


261 


261 


261 


261 


$28 


$30 


484 


Formula 


Number 


@ == Met (for a dichotomous population) (9.17) 
ne 

ME GE (9.18) 

ХОЛ NS 

an = eis (for a ND population only) (9.19) 

2 — ВО =p) (9.20) 
Р N-1 

Gig pee ety e уа г (9.21) 

9 xx, Xon 3g Vac aes 

g2, .,, = ECL pi) ү pall — рз) (9.22) 
91—98 т = 1 то — 1 

a= f (9.23) 

с = мир) (fora dichotomous population) (9.24) 

)9.25( کے = چ 

"E VNCT ; 

" 1.258 А А 

Cui MN (fora ND population only) (9.26) 

a, = AED (9.27) 

~ = 87) 825 

he Ael mel (9.28) 

(1— »(1 — рә 

Us = рыр) Hm be — po) (9.29) 
Е 8 

B= SUA UICE Lo dum (11.32), (11.3b) 

1.258 
£ &= mdni Шел (11.4a), (11.4b) 


SELECTED FORMULAS AND RULES 


Page Formula Number 
a CA ME eMe уге pill — pi) (11.6), 
981 оа тра рка, N (11.7a), (11.7b) 
(also see formula in footnote, p. 332) 
3 A= PB 2.9 pera 82 
333 A, A= Dı 2,2 NE IPIS (11.82), (11.8b) 
843 (= = 1) = Sl үл (12.3) 
347 3» тё + поё?» a 
: ы nı + na — 2 (12.5) 
Xi— Xs 
848 = „— 2( = = 12:8 
t(df = п + n2 ) = етрнет а E ^ т) ( ) 
nytne—2 ny no 
850 =F g al (12.10) 
8p 
ae pel EET E a ed 
358 a h= Ni Fhe TFT df=N-1 (12.12a), (12.12b) 
em 2 28% /1 1 
859 w = Иба + поё» [1 1), 
à АА Bi Fhe tes "up 
df=n+tne—2 (12.13a), (12.13b) 
E = 8 T é 
360 Mp, Ёр = Dı Fhe df=N—1 (12.142), (12.14b) 
Zzxgv, 
874 TE x Ys (13.1) 
389 Zr pL nnd =й —} Р 
=й ш=Х;—Х and y:=Y;— Y (13.2) 
= 
389 Erg: Ж a ee e 
MGE X;— X and y,—Y;—Y (133) 
ZX (EY: 
HB کے پو‎ x j 
a= Xi— X and y= Y;- ¥ (13.4) 


SELE 
ELECTED FORMULAS AND RULES 


485 


Page 


391 


898 


897 


Formula Number 


TY ovy: 
SE Bach à 
r= 


uus Im AES 
УУ 2 аи ee шщ 
NES i N 2: ; ү 


с MAN > Som (13.9) 
"d CX+A)(DY+B) (Grae 29) 

Y=bX+e (14.2) 
G= Z(Y:— Î)? = 5(Ү;— (bX: + cp? (14.3) 

Day: А 
b= S w%=X;,—X and y;2Y;—Y (14.4) 
c=Y—bxX (14.5) 
Y=vX—X)+¥=be+7 (14.6), (14.62) 
0 = be (14.7) 
02 

md (14.8) 
ICA MES 8 = 

eee wl =F G+ Y (14.9), (14.92) 
йз LM 

Meet (14.10) 
pd (14.11) 
а (14.12) 
жо (14.13) 
2289 

ay (14.15) 


SELECTED FORMULAS AND RULES 


Page Formula Number 


& n 
> Nat 
426 Bie y yj Үз У; (14.16) 
E i 
SS 
از = بے‎ М 
427 Daci » Cu—Yju-—Y; (14.17) 
Dx Ws 
WS UN MUI 
25 219 7 N 
427 õp = |== og = Vn P; (14.18) 


480 (14.20) 
430 Жез SO), aS Үг Ў y= YY (14.21) 
490 $2,.,— e2,(1— 72) (14.22) 
480 а 8 V1— ri (14.23) 
43 = Ху 2 __ NSP yes 4:58 

491 Ж Т ле бы жг 2 (14.24) 


LIE wc Vs (14.25) 
+ (14.26) 
) 


ш= У 
434 82 کم‎ bey (14.27) 
489 Srey = DP, h= Pi Y gU 
439 speES, е2: Т (14.29) 
440 жу Se, Bes Yp— Ve (14.30) 
440 en (14.31) 
440 ee 9 (14.32) 


we N—2 


SELEC 
ECTED FORMULAS AND RULES 487 


Number 
Page Formula 


440 8,9 (14.33) 
440 2 pom (14.34) 
"m zn (14.35) 
442 ex= X—bxY (14.36) 
442 X=bxY +ex (14.37) 
448 Шш = B+ uy (15.3) 
D j [x is12,e4N (15.4) 
446 съ= M (15.5) 
M6 h= NO а беч Bau (15.6), (15.7) 
ur ta z£, df=N-2 (15.8) 
H? = Fs (15.9) 
447 ср= # (15.10) 
HO 8 ча шл; (15.11), (15.12) 
we u= x acd (15.13) 
448 (15.15) 
451 df=N-2 (15.17) 


SELECTED FORMULAS AND RULES 


Formula Number 


8. B=b Ft, „2220020 = ag (15.18) 


(N — 2)(5г2,)2 


df=N—2 
z Er? (Ey) — (Er;y;)? Е 
JE ly 15.1 
Hy, My (FÉ VV — D502, (15.19) 
y= =r (15.20) 
с?р = r0, + 9? (15.21) 
oj = Vae, + т? (15.22) 


У 2j n EE „а 2 „2 
VF TET +] (15.23), (15.94) 
уру Brey = 


Ü (Zr?) (Zy?) — (Хт) x? 1 — 
LN T uj — Tus Isl (15.25) 


+ ) (15.26) 


+ 
| 
ER 
= 
— 
= 
сл 
to 
Ее] 
= 


I 


ume = tx -+ i| (15.28) (15.29)‏ د 


(N — 2a; 
Y, Y= Pi 1,20, (15.31) 
z= b og HE (15.32) 
1+ 
5 blog. үр (15.33) 
-— | 
و‎ m Nap (15.34) 


SELECTED FORMULAS AND RULES 489 


Page Formula Number 


463 (15.35) 


468 (15.36) 


490 


SELECTED FORMULAS AND RULES 


APPENDIX С 


TABLES 


TABLE I Squares and Square Roots of the Numbers from 


1 to 1,000 
Number Square Square Root Number Square Square Root 

1 1 1.000 51 26 01 7.141 

2 4 1.414 52 27 04 7.211 

3 9 1.732 53 28 09 7.280 

4 16 2.000 54 29 16 7.348 

5 25 2.236 55 30 25 7.416 

6 36 2.449 56 31 36 7.483 

7 49 2.646 57 32 49 7.550 

8 64 2.828 58 33 64 7.616 

9 81 3.000 59 34 81 7.681 
10 100 3.162 60 36 00 7.746 
11 121 3.317 61 3721 7.810 
12 144 3.44 62 38 44 7.874 
13 169 3.606 63 39 69 7.937 
14 196 3.742 64 40 96 8.000 
15 225 3.873 65 42 25 8.062 
16 256 4.000 66 43 56 8.124 
17 289 4.123 67 44 89 8.185 
18 324 4.243 68 46 24 8.246 
19 361 4.359 69 47 61 8.307 
20 400 4.472 70 49 00 8.367 
21 441 4.583 71 50 41 8.426 
22 4 84 4.690 72 51 84 8.485 
23 5 29 73 53 29 8.544 
24 5 76 74 54 76 8.602 
25 625 75 56 25 8.660 
26 676 76 57 76 8.718 
27 7 29 77 59 29 8.775 
28 7 84 78 60 84 8.832 
29 841 79 62 41 8.888 
30 900 80 64 00 8.944 
31 961 81 65 61 9.000 
32 10 24 82 67 24 9.055 
33 10 89 83 08 89 9.110 
34 1156 84 70 56 9.165 
35 12 25 85 7225 9.220 
36 12 96 86 73 96 9.274 
37 13 69 87 75 69 9.327 
38 14 44 88 77 44 9.381 
39 15 21 89 79 21 9.434 
40 16 00 90 8100 9.487 
41 16 81 91 8281 539 
42 1764 92 MU 5 
43 18 49 93 86 49 9.644 
44 19 36 94 88 36 9.695 
45 20 25 95 90 25 9.747 
46 2116 96 92 16 9.798 
ат 22 09 97 94 09 9.849 
48 23 04 98 96 04 9.899 
a 2:0 99 98 01 9.950 
0 25 00 100 1 00 00 10.000 


TABLE I 


TABLE | Squares and Square Roots of the Numbers from 
1 to 1,000 (Continued) 


Number Square Square Root Number Square Square Root 
101 10201 10.050 151 22801 12.288 
102 10404 10.100 152 23104 12.329 
103 10609 10.149 158 23409 12.369 
104 10816 10.198 154 2 37 16 12.410 
105 11025 10.247 155 24025 12.450 
106 11236 10.296 156 2 43 36 12.490 
107 11449 10.344 157 2 46 49 12.530 
108 1 16 64 10.392 158 2 49 64 12.570 
109 11881 10.440 159 25281 12.610 
110 12100 10.488 160 25600 12.649 
111 1 23 21 10.536 161 2 59 21 12.689 
112 1 25 44 10.583 152 2 62 44 12.728 
113 1 27 69 10.630 163 2 65 69 12.767 
114 1 29 96 10.677 164 2 68 96 12.806 
115 1 32 25 10.724 165 2 72 25 12.845 
116 1 34 56 10.770 166 2 75 56 12.884 
117 1 36 89 10.817 167 27889 12.923 
118 1 39 24 10.863 168 28224 12.961 
119 1 4161 10.909 169 28561 13.000 
120 1 44 00 10.954 170 2 89 00 13.038 
121 1 46 41 11.000 171 2 92 41 13.077 
122 1 48 84 11.045 172 2 95 84 13.115 
123 15129 11.091 173 2 99 29 13.153 
124 153 76 11.136 174 3 02 76 13.191 
125 15625 11.180 175 306 25 13.229 
126 1 58 76 11.225 176 3 09 76 13.266 
127 1 61 29 11.269 177 31329 13.304 
128 1 63 84 11.314 178 31684 13.342 
129 16641 11.358 179 32041 13.379 
130 1 69 00 11.402 180 3 24 00 13.416 
131 17161 11.446 181 3 27 61 13.454 
132 17424 11.489 182 33124 13.491 
133 17689 11.533 183 334 89 13.528 
134 179 56 11.576 184 33856 13.565 
135 18225 11.619 185 34225 13.601 
136 1 84 96 11.662 186 3 45 96 13.638 
137 1 87 69 11.705 187 3 49 69 13.675 
138 1 90 44 11.747 188 3 53 44 13.711 
139 1 93 21 11.790 189 35721 13.748 
140 1 96 00 11.832 190 3 61 00 13.784 
141 198 81 11.874 191 3 64 81 13.820 
142 201 64 11.916 192 3 68 64 13.856 
143 2 04 49 11.958 193 3 72 49 13.892 
144 20736 12.000 194 3 76 36 13.928 
145 21025 12.042 195 3 80 25 13.964 
146 21316 12.083 196 384 16 14.000 
147 21609 12.124 197 3 88 09 14.036 
148 21904 12.166 198 3 92 04 14.071 
149 22201 12.207 199 29501 14.107 

12.247 14.142 
Lo 150 2 25 00 4 


TABLE I (CONTINUED) 493 


TABLE I Squares and Square Roots of the Numbers from 
1 to 1,000 (Continued) 


Number Square Square Root Number Square Square Root 
201 40401 14.177 251 6 30 01 
202 4 08 04 14.213 252 6 35 04 
203 41209 14.248 253 6 40 09 
204 416 16 14.283 254 64516 7 
205 42025 14.318 255 65025 15.969 
206 4 24 36 14.353 256 16.000 
207 4 28 49 14.387 257 16.031 
208 432 64 14.422 258 6 65 64 16.062 
209 43681 14.457 259 6 70 81 16.093 
210 44100 14.491 260 6 76 00 16.125 
211 44521 14.526 261 68121 
212 4 49 44 14.560 262 6 86 44 
213 4 53 69 14.595 263 6 91 69 ; 
214 4 57 96 14.629 264 6 96 96 16.248 
215 4 62 25 14.663 265 70225 16.279 
216 4 66 56 14.697 266 707 56 16.310 
217 470 89 14.731 267 71289 16.340 
218 4 1524 14.765 268 71824 16.371 
219 4 79 61 14.799 269 72361 16.401 
220 4 84 00 14.832 270 7 29 00 16.432 
221 4 88 41 14.866 271 73441 
222 49284 14.900 272 т 39 84 
223 4 97 29 14.933 273 7 45 29 
224 501 76 14.967 274 750 76 
225 506 25 15.000 275 7 56 25 
226 276 7 61 76 
297 277 76729 
228 278 77284 : 
229 279 77841 16.703 
0 280 7 84 00 16.733 
231 281 sod oF 
232 282 79594 16:793 
233 283 8 00 89 
234 284 8 06 56 
235 285 81225 
26 5 m 286 817 06 16.912 
37 561 287 8 23 69 16.941 
238 5 66 44 288 8 29 44 16.971 
239 57121 15.460 289 83521 ‘000 
240 57600 15492 ^ Em 0 
5.44 290 841 00 17.029 
241 5 80 81 15.524 291 84681 17.059 
242 5 85 64 1 292 17.088 
243 5 90 49 1 293 17117 
244 5 95 36 1 294 17.146 
245 6 00 25 15.67 295 8 10 55 17.176 
246 6 05 16 15.684 296 5 
247 61009 15716 297 882 00 17231 
248 61504 15.748 298 8 88 04 17.263 
249 62001 15.780 299 89401 17:292 
250 6 25 00 15.811 300 900 00 17.321 


494 


TABLE I (CONTINUED) 


TABLE | 


Squares and Square Roots of the Numbers from 


1 to 1,000 (Continued) 


Number Square Square Root Number Square Square Root 
301 90601 14, 7. 349 
302 91204 
303 9 18 09 У 
304 9 24 16 18.815 
305 9 30 25 18.841 
306 9 36 36 12 67 36 18.868 
307 94249 127449 18.894 
308 948 64 1281 64 
309 95481 12 88 81 
310 96100 12 96 00 
311 13 03 21 19.000 
312 13 10 44 19.026 
313 13 17 69 19.053 
314 13 24 96 19.079 
315 13 32 25 19.105 
316 9 98 56 17.776 366 13 39 56 19.131 
317 10 04 89 17.804 367 13 46 80 19.157 
318 O11 24 17.833 368 13 54 24 19.183 
319 01761 17.861 369 13 61 61 
320 10 24 00 17.889 370 13 69 00 
321 10 30 41 17.916 371 13 76 41 
322 0 36 84 17.944 372 138384 
323 10 43 29 17.972 373 13 91 29 
324 10 49 76 18.000 374 13 98 76 
325 10 56 25 18.028 375 14 06 25 
326 062 76 18.055 376 19.301 
327 10 69 29 18.083 377 19.416 
10 75 84 18.111 378 19.142 
10 82 41 18.138 379 19.468 
0 89 00 18.166 380 19.494 
10 95 61 18.193 381 
10224 18.221 382 
f 18.248 383 
18.276 384 
18.303 385 
18.330 386 
18. 387 
18.385 388 ¢ 
18.412 389 19. 723 
18.439 390 19.748 
18. 166 391 19.774 
392 19.799 
393 
391 
395 
1197 16 396 
12 04 09 397 
12 11 04 398, 
4t 12 I8 01 399 75 
E 350 12 25 00 18.708 400 20.000 


TABLE 1 (CONTINUED) 


495 


TABLE 1 


Squares and Square Roots of the Numbers from 


1 to 1,000 (Continued) 


Number Square Square Root Number Square Square Root 
401 1608 01 20.025 45 І 
402 16 16 04 20.050 492 20 43 01 21200 
403 16 24 09 20.075 453 20 52 09 : 

j 24 09 5 20 52 09 21.284 
404 16 32 16 20.100 451 20 61 16 21.307 
405 16 40 25 20.125 455 2 5 21. 
55 20 70 25 21.331 
406 16 48 36 20.149 5 35 
407 16 56 49 20.174 2 tee EU 
) 457 20 88 49 21.378 
408 16 64 64 20.199 458 20 97 61 21401 
409 16 7281 20.221 459 210681 21424 
410 168100 205218 400 211600 e s 
411 16 89 21 20.273 E 
412 16 97 44 20.298 01 512921 
418 17 05 69 20.322 463 AEG 
414 17 13 96 20.347 461 1 А089 
415 172225 00372 Шо ООО 
B 2 5 
416 17 30 56 20.396 А 5 5 
417 173889 20421 1 шам 21080 
н tee He 467 21 80 89 21:610 
419 17 55 61 20.469 E e 
420 17 64 00 20.494 470 ара 21.650 
421 177241 20.51 aa 
429 17 80 84 20.243 A 22 1841 21:03 
423 178929 20567 4m 222788 217% 
424 179776 20591 48 223720 21749 
425 180625 20616 As RH ОЕ 
E 56 25 21.794 
426 18 14 76 20.640 А 
427 182399 20664 46 — 220570 021.817 
i uM DOR 477 22 75 29 21.840 
45 TUR пв 478 22 84 84 21.863 
430 184900 20736 io 20 41 21886 
21:909 
431 18 57 61 20. 
432 18 66 24 20:01 481 23 13 61 21.932 
85 4 
433 187489 20.809 82 232324 21.954 
434 18 83 56 20.833 bis: 23 32 89 21.977 
435 189225 20857 48, 284250 22000 
i 52 25 22.093 
436 190096 20881 | 
437 19 09 69 20.905 486 23 61 96 22.045 
438 191844 3 487 237169 68 
20.928 20 
439 102751 Dr 488 238144 22.091 
440 19 36 00 20.976 Eo 239121 22.113 
90 24 01 00 22.136 
441 194481 21.000 
442 19535 2100 491 211081 22.159 
443 19 62 49 21.048 288 24 20 64 22.181 
441 197136 31948 493 24 30 49 22.204 
445 19 80 25 21.095 m 214036 22.226 
á 4 50 25 4€ 
446 19 89 16 21.119 4 : wid 
447 19 98 09 21.142 96 2160 16 22.271 
448 20 07 04 21.166 487 24 70 09 22.203 
449 20 16 01 21.190 PS 24 80 04 22.316 
450 20 25 00 21.213 500 24 9001 22.338 
25 00 00 22:301 
TABLE I (CONTINUED) 


TABLE | 


Squares and Square Roots of the Numbers from 
1 to 1,000 (Continued) 


ine А 
Number 


Square Square Root 


Number 


Square 


Square Root 


501 
502 
503 
504 
505 


506 
507 
508 
509 


5 60 36 
70 49 
80 64 
90 S1 
20 01 00 


NNE 


NNN 
JJA 


N Nt 


27 66 76 
27 77 29 
278784 
27 98 41 
28 09 00 


28 19 61 
28 30 24 
28 40 89 
28 51 56 
28 62 25 


28 72 96 
28 83 69 
28 94 44 
29 05 21 
29 16 00 


29 26 81 
29 37 64 
29 48 40 
29 59 36 
29 70 25 


29 81 16 
29 92 09 
30 03 04 
30 14 01 
30 25 00 


599 
600 


30 36 01 


30 58 09 
30 69 16 
30 80 25 


30 91 36 
31 02 49 
31 13 64 
31 2481 
31 36 00 


314721 
31 5844 
31 69 69 
31 SO 96 
31 92 25 


32 03 56 


94 76 
33 06 25 


36 00 00 


24.000 
24.021 
24.042 
24.062 
24.083 


104 


WINNY 


24.413 
24.434 
24.454 
24.474 


24.495 


TABLE 1 (CONTINUED) 


497 


TABLE | Squares and Square Roots of the Numbers from 
1 to 1,000 (Continued) 
| Number Square Square Root Number Square Square Root | 
601 36 1201 12 38 01 25.515 
602 36 24 04 42 51 04 4 
603 36 36 09 42 64 09 
604 36 48 16 42 7716 
605 36 60 25 24.597 42 90 25 
606 36 72 36 24.617 656 43 03 36 
607 36 84 49 24.637 657 43 16 49 
608 36 96 61 24.658 658 43 29 64 
609 37 08 81 24.078 659 43 42 81 
610 37 2100 24.698 660 43 56 00 
611 37 33 21 24.718 661 
612 37 45 44 24.739 662 
613 37 57 69 24.759 663 
614 37 69 96 24.779 664 
615 37 82 25 24.799 665 
616 37 94 56 24.819 666 
617 38 06 89 24.839 667 
618 38 19 24 24.860 608 
619 383161 24.880 669 
620 38 44 00 24.900 670 
621 38 56 41 24.920 671 
622 38 68 84 24.940 672 
623 388129 24.960 673 
624 38 93 76 24.980 674 
625 39 06 25 25.000 675 
626 39 18 76 25.020 676 26.000 
627 39 31 29 25.040 677 26.019 
628 39 43 84 678 26.038 
629 39 56 41 679 26.058 
630 39 6900 680 26.077 
631 39 81 61 681 46 37 61 26.096 
632 39 94 24 682 46 51 24 26.115 
633 40 06 89 683 46 64 89 26.134 
634 40 19 56 684 46 78 56 26.153 
635 40 32 25 685 46 92 25 26.173 
636 40 44 96 686 47 05 96 26.192 
637 40 57 69 687 47 19 69 26.311 
638 40 70 44 688 47 33 44 26.230 
639 40 83 21 689 47 47 21 26.249 
640 40 96 00 690 47 61 00 26.268 
641 41 08 81 691 47 7481 26.287 
642 4121 64 692 47 88 04 26.306 
643 4134 49 693 48 02 49 26.325 
644 4147 36 694 48 16 36 26.344 
645 41 60 25 695 48 30 25 26.363 
646 417316 696 48 44 16 26.382 
647 41 86 09 697 48 58 09 26.401 
648 41 99 04 698 48 72 04 26.420 
649 421201 699 48 86 01 26.439 
| 950 42 25 00 700 49 00 00 26.458 


TABLE I (coxriNUED) 


TABLE | Squares and Square Roots of the Numbers from 
1 to 1,000 (Continued) 


Number Square Square Root Number Square Square Root 
701 49 14 01 26.476 
702 49 28 04 26.495 
703 49 42 09 26.514 
704 49 56 16 20.533 
705 49 70 25 26.552 
706 49 84 36 26.571 
707 49 98 49 26.580 
708 50 12 64 26.608 758 57 45 64 
709 50 26 81 26.627 759 57 60 81 
710 50 41 00 26.646 760 57 76 00 
711 50 55 21 26.665 761 57 91 21 
712 50 69 44 26.683 762 58 06 44 
713 50 83 69 26.702 763 58 21 69 
714 50 97 96 26.721 764 58 36 96 
715 51 1225 26.739 765 58 52 25 27.659 
716 51 26 56 26.758 766 58 67 56 27.677 
717 51 40 89 26.777 767 58 82 89 27.695 
718 515524 26.796 768 58 98 24 27.713 
719 51 69 61 26.814 769 59 13 61 27.731 
720 51 8400 26.833 770 59 29 00 27.749 
721 5198 41 26.851 771 59 44 41 27.767 
722 52 12 84 26.870 772 59 59 84 27.785 
723 52 27 29 26.889 773 59 75 29 27.803 
724 52 41 76 26.907 774 59 90 76 27.821 
725 52 56 25 26.926 775 60 06 25 27.839 
726 52 70 76 26.944 776 60 21 76 27.857 
727 52 85 29 26.963 TIT 60 37 29 27.875 
728 52 99 84 26.981 778 60 52 84 7.893 
729 53 14 41 27.000 779 60 68 41 27.911 
730 53 29 00 27.019 780 60 84 00 27.928 
731 53 43 61 27.037 781 60 99 61 27.946 
732 53 5824 27.055 782 6115 24 27.964 
733 53 72 89 27.074 783 6130 89 27.982 
734 53 87 56 27.092 784 61 46 56 28.000 
735 54 02 25 27.111 785 616225 28.018 
736 54 16 96 27.129 786 61 77 96 28.036 
737 54 31 69 27.148 787 61 93 69 28.054 
738 54 46 44 27.166 788 62 09 44 28.071 
739 54 61 21 27.185 789 62 25 21 28.089 
740 54 76 00 27.203 790 62 41 00 28.107 
741 54 90 81 27.221 791 62 56 81 28.125 
742 27.240 792 62 72 64 28.142 
743 27.258 793 62 88 49 28.160, 
744 27.276 794 63 04 36 28.178 
745 27.295 795 63 20 25 28.196 
746 55 65 16 27.313 796 63 36 16 28.213 
747 55 80 09 27.331 797 63 52 09 28.231 
748 559504 27.350 798 636804 28249 
749 56 1001 27.368 799 63 84 01 28.267 
750 56 25 00 27.386 800 64 00 00 28.284 


TABLE I (CONTINUED) 499 


TABLE I Squares and Square Roots of the Numbers from 
1 to 1,000 (Continued) 


Number Square Square Root Number Square Square Root 
801 641601 724201 
802 6432 04 04 
803 64 48 09 09 
804 64 64 16 129316 
805 618025 73 1025 
806 6196 36 T3 27 36 
807 65 12 49 73-4 49 
808 73 6164 
809 737881 
810 73 96 00 
811 861 711321 
812 862 74 30 44 
813 863 7447 69 
814 864 74 64 96 
815 865 74 8225 
816 866 74 99 56 
817 66 74 89 867 т: 
818 66 91 21 868 1 
819 67 07 61 869 7 
820 67 2100 870 7 
821 67 40 41 871 75 86 41 
822 76 03 84 
823 7 2129 
824 67 89 76 
825 68 06 25 16 56 25 
826 68 22 76 876 76 73 76 
76 91 29 
828 77 08 84 
829 с 77 2641 
830 68 89 00 880 77 44 00 29.665 
831 69 05 61 776161 29.682 
882 Ti 1924 
833 ) 77 96 89 
834 55 78 
835 69 72 25 78: 
836 69 88 90 886 78 49 96 29.766 
837 70 05 69 887 78 67 69 29.783 
838 70 22 44 888 54: 29.799 
839 70 39 21 889 79 03 21 29.816 
810 70 56 00 890 79 21 00 29.833 
841 70 7281 891 ‹ 
812 70 89 64 892 7 36 ба 
843 71 06 49 893 79 74 49 29.883 
844 71 23 36 894 79 92 36 29.90 
845 71 40 25 29. 069 895 80 10 25 29.916 
846 715716 29.086 896 j 
847 717109 29.103 897 an 46 09 29 950 
848 719104 29.120 898 80 64 04 E 
849 72 08 01 29.138 899 80 82 01 29.983 
850 72 2500 29.155 900 8100 00 30000 — | 


500 


TABLE I (cox riNUED) 


TABLE | 


Squares and Square Roots of the Numbers from 
1 to 1,000 (Concluded) 


т 


Number Square Square Root Number Square Square Root 
901 SI IS 01 30.017 951 90 4401 30. 
902 8I 3601 30.033 2 90 63 04 30. 
903 SI 5400 30.050 90 82 00 30.871 
904 81 72 16 30.067 910116 30.887 
905 S1 90 25 30.083 91 20 25 30.903 
906 30.100 91 39 36 
907 1 
908 91 77 61 
900 919681 
910 92 16 00 
911 31.000 
912 31.016 
913 31.032 
914 31.048 
915 31.064 
916 83 90 56 966 93 31 56 31.081 
917 S4 08 89 967 93 50 89 31.097 
918 S4 27 24 968 93 70 24 31.113 
919 S4 45 61 969 93 89 61 31.129 
920 84 64 00 970 94 09 00 31.145 
921 S4 S2 41 30.348 971 94 28 41 3l. 
922 30.364 972 94 47 81 ЗІ. 
923 30.381 973 31.19: 

30.397 974 31.2 
30.414 975 31 
30.- 976 31.241 
85 03 29 Б 078 
S6 LL SH 30.46: of M. 
86 3041 30.480 979 95 S41 
30.496 950 96 04 00 


86 49 00 


8704 89 


87 2356 
87 4225 


87 60 96 
87 79 69 
87 98 44 
88 17 21 
88 36 00 


89 30 25 


89 49 16 
89 68 09 
89 87 04 
90 06 01 
90 25 00 


30.594 
30.610 
30.627 
30.643 
30.659 


30.676 
30.602 
30.708 

Я 


990 


991 
992 
993 
904 
905 


996 
997 
998 
999 
1000 


96 23 61 


97 21 06 
97 41 69 
97 6144 
97 81 21 
98 01 00 


98 20 S1 
9S 40 64 
98 60 40 
Ох SO 36 
99 00 25 


99 20 16 
99 40 09 
90 60 04 
99 80 01 
100 00 00 


31.623 


ABLE 1 (CONCLUDED) 


TABLE j| Normal Curve Areas and Ordinates* 


Cor. 1 Cor. 2 Cor. 3 Cor. 4 Cor. 5 Con. Û Cor. 7 Cor. 8 
X Propor- 
Propor- tion Э yasa 9 PR PR m 
+2 Hon, Beyond y of yatpu | of +2 of 2 
и toz Eg 
5 0.00 
0.00 .0000 1.0000 .3989 100.00 50.00 
+ 0.01 .0040 .9920 .3989 99.99 49.60 = 001 
+ 0.02 .0080 .9840 .3989 99.98 19.20 - Ne 
+ 0.03 .0120 .9761 .3988 99.95 48.80 — 0. 08 
+ 0.04 .0160 .9681 -3986 99.92 48.40 = 0.0 
+ 0.05 .0199 -9601 3984 99.87 48.01 — 0.05 
+ 0.06 .0239 .9522 .3982 99.82 47.61 — 0.06 
+ 0.07 0279 9442 3980 99.76 47.21 — 0.07 
+ 0.08 .0319 .9382 -3977 99.68 53.1 36.81 - 005 
+ 0.09 .0359 .9283 .3973 99.60 53.59 46.41 — 0.0€ 
+ 0.10 .0398 .9203 .3970 99.50 46.02 — 0.10 
+ 0.11 .0438 .9124 .3965 99.40 45.62 — 0.11 
+ 0.12 0478 .9045 :3961 99.28 45.22 – 0.12 
+ 0.13 .0517 .8966 .3956 99.16 44.83 — 0.13 
0.14 .0557 .8887 -3951 99.02 44.43 — 0.14 
+0.15 .0596 .8808 3945 98.88 44.04 — 0.15 
+ 0.16 .0636 .8729 :3939 98.73 43.64 — 0.16 
+ 0.17 .0675 :8650 +3932 98.57 56.7: 43.25 = 0.17 
+ 0.18 .0714 :8572 .3925 98.39 57.14 42.86 — 0.18 
+ 0.19 .0753 .8493 -3918 98.21 57.53 42.47 — 0.19 
+ 0.20 0793 8415 :8910 98.02 57.93 42.07 — 0.20 
+ 0.21 .0832 .8337 3902 97.82 58.32 41.68 — 0.21 
+ 0.22 -0871 .8259 .3894 97.61 58.71 41.29 — 0.22 
+0.23 | .0910 8181 -3885 97.39 40.90 | — 0.23 
+ 0.24 .0948 :8103 :3876 97.16 59. 40.52 — 0.24 
+ 0.25 .0987 -8026 -3867 96.92 59.87 40.13 — 0.25 
+ 0.26 .1026 7949 .3857 96.68 60.26 39.74 
+ 0.27 1064 7872 .3847 96.42 60.64 39.36 
+ 0.28 .1103 7195 -3836 96.16 61.03 38.97 
+ 0.29 1141 77118 .3825 95.88 61.41 38.59 
+ 0.30 1179 7642 .3814 95.60 61.79 38.21 
+ 0.31 1217 -7566 95.31 62.17 37.83 
+ 0.32 1255 -7490 95.01 62.55 37.45 
+ 0.33 .1293 7414 94.70 62.93 37.07 
+ 0.34 .1331 .7339 94.38 63.31 36.69 
.1368 .7263 94.06 63.08 36.32 


* The values in the various columns of this table were derived independently from data given 
in Table 1 of Biometrika Tables for Statisticians, edited by E. S. Pearson and Н. O. Hartley: 
which reports to seven decimal places. Because of this independent determination corre- 
sponding values in different columns may be inconsistent to the extent of + .0001. For ex- 
ample, for z= + 0.03 the value in Column 3 is given as .9761. On the other hand, this value 
determined from the value in Column 2 of our table is 1.0000 — (2)(.0120) = .9760. We pre- 
ferred to present values of uniform accuracy throughout our table rather than to eliminate 
rounding inconsistencies of the type cited. Permission to make this use of Table 1 of B®” 


metrika Tables for Statisticians, Cambridge University Press, edited by Pearson and Hartley» 
was granted by the publishers. 


502 


TABLE H 


TABLE І 


Normal Curve Areas and Ordinates (Continued) 


O 1 Corn.2 Con. 3 Cor. 4 Cor. 5 Cor. 6 Cor. 7 Cor. 8 

ТРО Propor- 
+5 : TUAE tion y jasn © PR PR 223 

а/а Beyond ofyatgu| of +2 of—z 

+ 0.36 . 9 93.73 64.06 

+ 0.37 3725 93.38 64.43 

+ 0.38 3712 93.0: 

+ 0.39 .3697 

+ 0.40 .3683 

+ 0.41 .1591 .3668 

+ 0.42 .1628 35 

+ 0.43 1664 

+ 0.44 .1700 9 

+0.45 .1736 67.36 

+ 0.46 1772 .6455 89.96 67.72 

+ 0.47 .1808 6384 89.54 68.08 

+ 0.48 1844 .6312 89.12 68.44 

T 0.49 .1879 .6241 68.79 

+ 0.50 .1915 6171 69.15 

$ .1950 -3503 

+ .1985 .9485 

1 .2019 3467 

+ .2054 .3448 

+ .2088 .3429 

+ 0.56 .2123 -3410 

+ 0.57 .2157 :3391 

T .2190 .3372 

+ 0.59 .2224 .3352 

+ 0.60 .2257 .3332 

+ 0.61 2291 3312 

+ 0.62 2324 +3292 

+ 0.63 2357 3271 

+ 0.64 .2389 .3251 

+ 0.65 .2422 .3230 

+ 0.66 .2454 .509: 3209 

+ 0.67 2486 502€ 3187 

+ 0.68 .2517 4965 3166 

+ 0.69 2549 4902 3144 

+ 0.70 .2580 4839 3123 

+0.71 2611 3101 

+ 0.72 2642 .3079 

+ 0.73 .2673 .3056 

+ 0.74 2704 3034 

+ 0.75 2734 3011 

+ 0.76 2764 .2989 22.36 

+ 0.77 2794 -2966 22.06 

+ 0.78 2823 -2943 21.77 

+0.79 | 2852 2920 2148 

+ 0.80 -2881 .2897 21.19 

, 


TABLE п (CONTINUED) 


TABLE | | Normal Curve Areas and Ordinates (Continued) 


Corel Cor. 2 Cor. 3 Coi. 4 Cord. Cor. 6 Coi, 7. Cor. 8 
Propor- 
Propor- 3 hn x Н è 
EUN iy UN ў у ах И A 1 R 1 PR EE 
i03 eyon ofyat mw] of +2 of —z 
и +z 
A179 
4122 
4065 
› 
50.23 
80.51 
80.78 
81.06 
à 81.33 
66.70 | $1.59 
66.10 | 81.86 
55 8 
; — 0.93 
— 0.94 
— 0.95 
— 0.96 
— 0.97 
— 0.98 
— 0.99 
— 1.00 
t — 1.01 
+ — 1.02 
+ — 1.03 
^s ~ 1.04 
= 1.05 
+ 
+ — 1.06 
t — 1.07 
+ — 1.08 
" — 1.09 
— 0 
+ | 5 — 1.11 
+ 3131 ae 
+1 2107 kS 
+1 .2083 = fad 
+1 2059 EVE 
+ 1.16 2036 : 
PE 3012 E 
+ ы . 1080 49.85 11:90 
1 1965 | 4926 | 883 7 
+120 1942 | 4868 | 8849 ia 
+1.21 | .3809 .2263 1919 
+122 88 2225 Ў THU 
+ 123 е 
qo. 10.93 
+1.25 


504 


TABLE п (CONTINUED) 


TABLE | 1 Normal Curve Areas and Ordinates (Continued) 


Car. 2 Col. 3 Cor. 4 Cor. 5 Cor. 6 Con. 7 Cor. 8 


Propor- 


Propor- d 
тарот tion yasa < РЕ PR 


е А. е BIIN J of y atu of +2 of—z eum 
3962 2077 10.38 
3980 3041 10.20 
.3997 .2005 10.03 
A015 .1971 09.85 
1032 1936 09.68 
A040 .1902 .1691 90.49 09.51 
4066 -1669 90.66 09.34 
A082 1647 90.82 09.18 
A099 .1626 90.99 09.01 
A115 -1604 91.15 08.85 


.1738 0 
1707 0 
1676 0 


08. 
08.08 


.1497 37.53 

-1476 37.01 07.93 

156 E ) 07.78 

-1435 T 07.64 

1415 3. j 07.49 

.1394 34.95 07.35 
1.46 1279 07.21 
1.47 1292 07.08 
1.48 1306 06.94 
1.49 4319 06.81 
1.50 14332 06.68 


TEE FELE КЕРЕК FEPER +4444 $4444 


І 
| 
1 ; 
І ; .1236 06.18 
E 4394 1211 06.06 
pl 4406 4188 
+1. AHIS 1164 
+1 A429 KERTI 
+1; A441 1118 
+ 1.6 452 1096 
+ 1.61 1074 1092 27.36 
+ 1.62 .1052 1074 26.92 
+ 1.63 1031 1057 26.49 
+ 1.64 1010 1040 
+ 1.65 .0990 .1023 
+ 1.66 0969 100% > 
+ 1.67 .0949 0989 A 
+ 1.68 .0930 ‚0973 A 
+1.69 0910 0957 04:55 
+ 1.70 0891 0940 0416 


TABLE II (CONTINUED) 505 


TABLE П Normal Curve Arcas and Ordinates (Continued) 


Cor. 1 Cor. 2 Cor. З Cor. 4 Con, 5 Cor. 6 Cor. 7 Cor. 8 


Propor- Propor- > > › 
tion tion j y asa % PR PR = 
spin Bevond ۴ ofyatgu| of +z of =2 
Ltoz а 
1.71 .0873 .0925 23.18 04.36 - 1.71 

i 1.72 0854 .0909 22.78 0427 | — 1.72 
+1.73 0836 0893 22.39 0418 | — 1.73 
+1.74 .0819 .0878 22.01 04.09 = 174 
+1.75 .0801 .0863 21.63 04.01 = 1.75 
+176 4608 0784 0848 21.25 03.92 | — 
+ 1.77 4616 0767 0833 20.88 = 
+178 | .4625 0751 0818 20.51 — 178 
+179 | .4033 .0735 10804 20.15 — 1.79 
+ 1.80 4641 0719 .0790 19.79 = 18 
+ 1.81 4649 .0703 0775 19.44 — 1.81 
+182 | .4656 .0688 .0761 19.09 = 1.82 
+ 1.83 4664 .0673 0748 18.74 — 1.83 
+ 1.84 4671 .0658 .0734 18.40 — 1.84 
+1.85 | 14678 0643 0721 18.06 — 1.85 
+186 | 4686 0629 0707 17.73 — 1.86 
+187 | .4693 0615 .0694 17.40 — 1.87 
+188 | .4699 -0601 .0681 17.08 — 1.88 
+1.89 | .4706 .0588 .0669 16.76 = 1.89 
+1.90 | .4713 0574 .0656 16.45 — 1.90 
+1.91 4719 0561 0644 16.14 97.19 02.81 - 
+1.92 | .4726 .0549 .0632 15.83 97.26 02.74 - 
+1.93 | .4732 0536 .0620 15.53 97.32 0268 | — 
+1.94 | .4738 .0524 .0608 15.23 97.38 02.62 | — 
+195 | 4744 0512 0596 14.94 97.44 02.56 | = 
+1.96 | 4750 .0500 14.65 97.50 02.50 | — 1.96 
+1.97 | .4756 .0488 14.36 97.56 02.44 — 1.97 
+1.98 | 4761 0477 14.08 97.61 02.39 | — 1.98 
+1.99 | .4767 .0466 13.81 97.67 02.33 | — 1.99 
+200 | .4772 .0455 13.53 97.72 02.28 | — 2.00 
+ 2.01 4778 0444 13.26 97. 2.2: 
+2.02 | 4783 0434 13.00 9183 0223 

+ 2.03 | .4788 0424 12.74 97.88 02.12 
+ 2.04 | 4798 0414 ; 12.48 97.93 02.07 | 
+2.05 | 4798 0404 .0488 12.23 97.98 02.02 | —2.05 
+ 2.06 4803 .0394 .0478 11.98 98.03 ( — 2.06 
+207 | .4808 0385 0468 11.74 98.08 m — 2.07 
+ 2.08 4812 .0375 .0459 11.50 98.12 01.88 | —2.08 
+209 | 4817 | .0366 | .0449 | 11.26 | 9817 | 018 | —2.09 
4 2.10 4821 0357 0440 11.03 98.21 01.79 — 2.10 
+2.11 4826 0349 0431 10.80 ; 74 29 
+ 2.12 4830 .0340 0422 10.57 aan ns —2. i 
+2.13 | .4834 .0332 .0413 10.35 98.34 0166 | — 2.13 
+2.14 | 4838 .0324 .0404 10.13 98.38 0162 | —2.14 
+2.15 | 4842 .0316 .0396 09.91 98.42 01.58 | —2.15 


506 


TABLE п (CONTINUED) 


'TABLE | [ Normal Curve Areas and Ordinates (Continued) 


Cor. 1 Cor. 2 Cor. 3 Cor, 4 Cor. 5 Cor. 6 Cor. 7 Cor. 8 
Pirangi Propor- 
FR us r tion , yasa ©} PR PR Р 
б n Beyond " of yatp| of +z of – 2 = 
atoz +2 
+ 2.16 4846 .0308 .0387 09.70 01.54 — 2.16 
* 2.17 4850 .0300 .0379 09.49 01.50 — 2.17 
+ 2.18 1854 .0293 .0371 09.29 01.46 — 2.8 
+ 2.19 4857 0285 0363 09.09 01.43 — 2.19 
+ 2.20 4861 0278 0355 08.89 01.39 — 2.20 
+ 2.21 A864 .0271 .0347 08.70 01.36 — 2.21 
+ 2.22 4868 .0264 .0339 08.51 01.32 — 2.22 
+ 2.23 A871 .0257 .0332 08.32 01.29 — 2.23 
T 2.24 A875 .0251 .0325 08.14 01.25 — 2.24 
+ 2.25 A878 0244 0317 07.96 01.22 — 2.25 
+ 2.26 4881 .0238 .0310 07.78 98.81 01.19 — 2.26 
+ 2.27 .4884 .0232 .0303 07.60 98.84 01.16 — 2.27 
+ 2.28 A887 .0226 .0297 07.43 98.87 01.13 — 2.28 
T 2.20 .4890 .0220 .0290 07.27 98.90 01.10 — 2.29 
+ 2.30 4893 0214 0283 07.10 98.93 01.07 — 2.30 
+ 2.31 A896 0209 -0277 06.94 98.96 01.04 — 2.31 
+2.32 A898 .0203 .0270 06.78 98.98 01.02 — 2.32 
+2.33 | 4901 10198 | 0264 | 06.62 | 99.01 | 00.99 | —2.33 
+ 2.34 4904 .0193 .0258 06.47 99.04 00.96 — 2.34 
+ 2.35 .4906 .0188 :0252 06.32 99.06 00.94 — 2.35 
+ 2.36 .4909 .0183 0246 06.17 99.09 00.91 — 2.36 
+ 2.37 4911 0178 0241 06.03 99.11 00.89 — 2.37 
+ 2.38 A913 0173 .0235 05.89 99.13 00.87 — 2.38 
+ 2.39 4916 0168 0229 05.75 99.16 00.84 — 2.39 
+ 2.40 A918 0164 0224 05.61 99.18 00.82 — 2.40 
+ 2.41 4920 0160 .0219 05.48 99.20 00.80 — 241 
+242 | 4922 10155 10213 05.35 99.22 00.78 | —242 
+ 2.43 4925 0151 .0208 05.22 99.25 00.75 — 2.43 
+244 | 14997 ‘0147 10203 | 05.10 | 99.27 0073 | —244 
+ 2.45 4929 :0143 .0198 04.97 99.29 00.71 — 2.45 
+ 2.46 4931 .0139 .0194 04.85 99.31 00.69 — 2.46 
+ 2.47 4932 :0135 :0189 04.73 99.32 00.68 — 2.47 
+ 2.48 .4934 .0131 -0184 04.62 99.34 00.66 — 2.48 
+ 2.49 4936 0128 0180 04.50 99.36 00.64 — 2.49 
+ 2.50 4938 0124 0175 04.39 99.38 00.62 — 2.50 
+ 2.51 .4940 .0121 0171 04.29 99.40 00.60 — 2.51 
+ 2.52 A941 0117 .0167 04.18 99.41 00.59 — 2.52 
+ 2,53 .4943 0114 0163 04.07 99.43 00.57 — 2.53 
+ 2.54 4945 0111 .0158 03.97 99.45 00.55 — 2.54 
+ 2.55 .4946 .0108 .0154 03.87 99.46 00.54 — 2.55 
+ 2.56 .4048 .0105 0151 03.77 99.48 00.52 — 2.56 
+ 2.57 4949 0102 0147 03.68 99.49 00.51 — 2.57 
+ 2.58 | 14951 10099 .0143 | 03.59 | 99.51 0049 | — 258 
+ 2.59 4952 -0096 .0139 03.49 99.52 00.48 — 2.59 
+ 2.60 4953 .0093 .0136 03.40 99.53 00.47 — 2.60 


f 


TABLE 11 (CONTINUED) 507 


TABLE [ I Normal Curve Arcas and Ordinates (Continued) 


| Cor. 1 Cor. 2 Cor. 3 Cor. 4 Cor. 5 Cor. 6 Coi. 7 Cor. 8 


Propor- n 
tion yasa % PR ut -$ 
ж Beyond y ofyatg| of +z of—z 
+ 

+ 2.61 0091 0132 | 0332 | 9 0045 | —2. 
+ 2.62 .0088 0129 | 0323 | 9 0044 | —2 
+ 2.63 .0085 0126 | 0315 | 9 0043 | —2 
+ 2.64 ‘0083 0122 | 0307 | 99.59 | 0041 | —2 
+ 2.65 .0080 0119 | 02.99 | 99.60 | 0040 | —2. 
+ 2.66 .0078 0116 | 02.91 99.61 00.30 | — 2.06 
+ 2.67 10076 .0113 | 02.83 | 99.62 | 00:35 | —267 
+ 2.68 10074 9110 | 02.76 | 99.63 | 00.37 | — 268 
+ 2.69 0071 0107 | 0268 | 9964 00.36 | — 2.69 
+2.70 0069 0104 | 0261 99.65 | 0035 | —270 
+2.71 | .4966 .0067 .0101 02.54 99.06 | 00.34 
+2.72 | .4967 0065 0099 | 0247 | 99.67 | 0033 
+2.73 | 4968 | 0063 0096 | 0241 99.68 | 00.32 
+274 | 4969 0061 0093 | 02.34 | 99.69 | 0031 
+2.75 | 4970 0060 0091 02.28 | 99.70 | 0030 
+2.76 | 4971 0058 0088 | 02.22 | 99.71 00.29 
+2.77 | А972 0056 0086 | 02.16 | 9972 | 0028 
+2.78 | 4973 0054 0084 | 02.10 | 9973 | 0027 
+2.79 | 4974 10053 0081 02.04 | 99.74 | 00.26 
+280 | 4974 10051 0079 | 01.98 | 9974 00.26 
+281 | .4975 .0050 0077 | 01.93 | 99.75 | 0025 
+282 | 4976 0048 0075 | 0188 | 9976 | 0024 
+2.83 | 4977 0047 0073 | 0182 | 9977 | 0023 
+ 2.84 | 4977 0045 0071 01.77 | 99.77 | 0023 
+2.85 | 4978 0044 0069 | 01.72 | 99:78 | 0022 
+286 | .4979 .0042 0067 | 01.67 | 99.79 | 0021 
+287 | .4979 0041 0065 | 01.63 | 9979 | 0021 
+2.88 | 4980 0040 0063 | 01.58 | 9980 | 0020 
+ 2.89 | 4981 .0039 0061 01.54 | 99.81 00.19 
+2.90 | ‘4981 0037 | .0060 | 0149 | 9981 00.19 
+2.91 | .4982 .0036 0058 | 01.45 | 99.82 | 0018 
+2.92 | 4982 0035 0056 | 0141 99.82 00.18 
+2.93 | .4983 0034 0055 | 0137 | 99.83 | 0017 .93 
+294 | .4984 .0033 0053 | 0133 | 99.84 | 0016 | —294 
+2.95 | 4984 .0032 | 0051 0129 | 9984 | 0016 | —2.95 
+2.96 | .4985 .0031 :0050 | 01.25 | 99.85 00.15 | — 2.96 
+297 | .4985 | .0030 | .0048 | O21 | goss | 015 | 297 
+2.98 | 14986 10029 0047 | 01.18 | 9986 | 0014 | —298 
+2.99 | .4986 .0028 0046 | 0114 | 9986 | 0014 | —2.99 
+3.00 | 4987 .0027 0044 01.11 99.87 | 00.3 | — 3.00 
+3.01 | .4987 .0026 .0043 | oros | 99.87 : — 3.01 
+3.02 | 4087 | .0025 | ‘0042 | oros | 9987 00. 13 E 302 
+3.03 | 4988 .0024 -0040 01.01 99.88 00.12 | — 3.03 
+3.04 | .4988 0024 0039 | 00.98 | 9988 | 0012 | Z304 
+ 3.05 4989 -0023 -0038 00.95 99.89 00.11 — 3.05 


508 


TABLE II (CONTINUED) 


TABLE | | Normal Curve Areas and Ordinates (Concluded) 


Con 1 Cor. 2 Con. 3 Con. 4 Con. 8 Cor. 6 Со. 7 Con. 8 
Propor- 
Propor- Aj T " - 
ЗЕ tion nd y e к ўн ree E 2 ni 
ptoz [s : 
A080 .0022 00.93 99.89 00.11 
A080 002 i 00.90 99.89 00.11 
1990 .0021 .0035 00.87 99.90 00.10 
1990 0020 0034 00.84 99.90 00.10 
1900 0019 .0033 00.82 99.90 00.10 — 3.10 
«4991 10019 .0032 00.70 99.91 00.09 
19091 001% .0031 00.77 99.91 00.09 
.0017 .0030 00.75 99.91 00.09 
.0017 0029 00.72 99.92 00.08 
0016 .0028 00.70 99.92 00.08 
1 0016 ‚0027 00.68 99.92 00.08 
4 0015 0026 00.66 99.92 00.08 
E! 0015 .0025 00.64 99.93 00.07 
1 0014 .0025 00.62 99.93 00.07 
1 0014 0024 00.60 99.93 00.07 
ИДЕ] 0013 ‚0023 99.93 00.07 
1994 0013 0022 99.94 00.06, 
ANT 0012 0022 00.54 99.94 00.06 
1994 0012 .0021 00.53 99.04 00.06 
40904 .0012 .0020 00.51 99.94 00.06 
4994 0011 0020 00.49 00.06 
1995 10011 .0019 00.48 Б 00.05 
1995 0010 0018 00.46 99.95 00.05 
1995 :0010 0018 00.45 99.95 00.05 
4995 0010 0017 00.43 99.95, 00.05 
t: 1996 .0008 :0015 00.37 99.96 00.04 
5 a 10007 -0012 99.97 00.03 
Че 1997 .0006 .0010 99.97 00.03 
+ 199% .0005 .0009 00.22 99.98 00.02 
+ 1998 .0004 .0007 00.18 99.98 00.02 
+ 1998 .0003. .0006 00.15 99.98 00.02 
sf A999 0003 .0005 00.13 99.99 00.01 
+ A999 0002 0004 00.11 99.99 00.01 
+ A999 £0002 -0004 00.09 99.99 00.01 
T A999 0001 :0003 00.07 99.99 00.01 
+ 0001 .0002 00.06 90.99 00.01 
+ 0001 -0002 00.05 99.995, 00.01 
+ 0001 0002 00.04 99.996, 00.004 — 
+ 4 0001 -0001 00.03 99.997 00.003 — 4.00 


ғ 


TABLE I (CONCLUDED) 509 


TABLE I I I 


Table of Normalized T-Scores* 


PR 0 Bi 2 3 4 5 RU AT 3 E: 
0 — 19 21 22 23 24 25 25 26 26 
1 27 27 27 28 28 28 29 29 29 29 
2 29 30 30 30 30 30 31 31 31 31 
3 31 31 31 32 32 32 32 32 32 32 
4 32 33 33 33 33 33 33 33 33 33 
5 34 34 34 34 34 34 34 34 34 34 
6 34 35 35 35 35 35 35 35 35 35 
T 35 35 35 35 35 36 36 36 36 36 
8 36 36 36 36 36 36 36 36 36 37 

10 37 37 37 37 37 37 38 38 38 38 
12 38 38 38 38 38 38 39 39 39 39 
14 39 39 39 39 39 39 39 40 40 40 
17 40 40 41 41 41 41 41 41 41 41 
19 41 41 41 41 41 41 41 41 42 42 
22 42 42 42 42 42 42 42 43 43 43 
25 43 43 43 43 43 43 43 43 44 44 
29 44 44 45 45 45 45 45 45 45 45 
32 45 45 45 45 45 45 45 46 46 46 
36 46 46 46 46 47 47 47 47 47 47 
40 47 47 48 48 48 48 48 48 48 48 
44 48 49 49 49 49 49 49 49 49 49 
48 49 50 50 50 50 50 50 50 50 50 
51 50 50 50 50 50 50 50 50 50 50 
52 51 51 51 51 51 51 51 51 51 51 
55 51 51 51 51 51 51 51 51 51 51 
56 52 52 52 52 52 52 52 52 52 52 
59 52 52 52 52 52 52 52 52 52 58 
63 53 53 53 53 53 53 53 54 54 54 
67 54 54 54 54 55 55 55 55 55 55 
70 55 55 55 55 55 55 55 55 55 56 
74 56 56 56 57 57 57 57 57 57 57 
77 57 57 57 57 58 58 58 58 58 58 
80 58 58 58 59 59 59 59 59 59 59 
82 59 59 59 59 59 59 59 59 59 60 
85 60 60 60 60 61 61 61 61 61 61 
87 61 61 61 61 61 62 62 62 62 62 
89 62 62 62 62 62 63 63 63 63 63 
91 63 63 64 64 64 64 64 64 64 64 
92 64 64 64 64 64 64 64 65 65 65 
93 65 65 65 65 65 65 65 65 65 65 
94 66 66 66 66 66 66 66 66 66 66 
95 06 67 67 67 б бт 67 67 єт 67 
96 68 68 68 68 68 68 68 68 69 69 
97 89 69 69 69 б 7% 7% тю ш 10 
98 71 7l 71 71 71 72 72 72 73 73 
99 73 74 74 75 75 76 77 78 79 81 
0 л 2 3 E! 5 Я - 
5 6 a 8 9 


* Column headings indicate the tenths place in the PR- 


510 


value, 


TABLE II 


TABLE IV 


Percentile Rank 


: of a Normalized T-Score* 


E T 0 1 2 Б 4 5 6 7 8 9 
1 ~ — 0.01 0.01 0.02 0.02 0.03 0.05 007 0.10 
2 0.13 0.19 026 0.35 047 0.62 0.82 1.07 1.39 1.79 
3 2.28 287 3.59 446 548 668 808 9.68 11.51 13.57 
+ 15.87 1841 2119 2420 2743 30.85 3446 38.21 4207 46.02 
5 50.00 53.98 57.93 61.79 65.54 69.15 72.57 74.80 78.81 81.59 
6 84.13 86.43 8849 90.32 91.92 9332 94.52 95.54 96.41 97.13 
7 97.72 98.21 98.61 98.93 99.18 99.38 99.53 99.65 99.74 99.81 
8 99.86 99.90 99.93 99.95 99.97 99.98 99.98 99.99 — — 


* Row headings indicate the tens digit and column headings the units digit of the T-values. 


TABLE IV 


511 


TABLE V 


Ten Thousand Randomly Assorted Digt 


00-04 05-09 10-14 15-19 20-24 25-20 30-34 35 39 40 41 45 19 
00 | 54463 22062 70639 29085 
01 | 15389 85205 39226 
02 | 85941 40756 82414 02015 
03 | 61149 69440 11286 88218 03638 
01 | 05219 81619 10651 67079 50888 
05 | 41417 98326 S7719 92291 46614 50948 
06 | 28357 94070 20652 16249 75019 
07 | 17783 00015 10806 91530 364 t 
08 | 40950 84820 29881 62800 81710 90279 
09 | 82995 64157 66164 10089 78258 37231 
10 | 96754 17676 : 9 47361 86679 27083 
11 | 34357 88040 53 45690 ў 71113 
12 | 06318 37403 19927 50423 80182 
13 | 62111 52820 07243 89292 $4767 2 11551 
14 | 47534 09243 67879 23410 12710 02510 32949 13491 
15 | 98614 75993 84460 62846 59844 14922 48730 73443 48167 34770 
16 | 24856 03648 44898 09351 98795 18644 39765 71058 90368 44104 
17 | 96887 12479 80621 66223 86085 78285 3342 42846 94771 
18 | 90801 21472 42815 77408 37390 76766 52615 3 18106 
19 | 55165 77312 83666 36028 28120 70219 81369 11067 
20 | 75884 12952 84318 95108 72305 64620 91318 
21 | 16777 37116 58550 42958 21400 43910 01175 
22 | 46230 43877 80207 88877 89380 32992 91380 
23 | 42902 66892 46134 01432 91710 23474 20423 (60137 60609 
24 | 81007 00333 39693 28039 10154 95425 39220 19774 31782 
25 | 68089 01122 51111 72373 06902 74373 96199 97017 41273 
26 | 20411 67081 50 16944 93054 87687 77054 
27 | 58212 13160 15718 82627 76999 ( 
28 | 70577 49866 61210 76046 67699 12006 93758 
29 | 94522 74358 71659 62038 79613 79109 05137 39038 
30 | 42626 86819 85651 88678 32 ‹ 
31 33763 57191 16752 ‘ 0 dm М 
32 27047 33851 4470; 46716 55781 
33 09419 89964 04894 17805 21896 
34 40293 09985 01412 69121 82171 59058 
35 | 98409 66162 95763 47420 20792 61527 20441 39435 11859 
36 | 45170 84882 65109 96597 25930 66790 65706 61903 33 
37 | 89300 69700 50741 30329 11658 23166 0: 66669 
38 | 50051 95137 91631 66315 91428 12275 21816 68091 33258 
39 | 31753 85178 31310 89642 98364 02306 24617 09609 22716 
40 | 79152 53829 20190 56535 18760 69942 77448 48805 
41 | 44500 38750 56540 64900 42912 79149 18710 ms 
42 | 68328 83378 71381 39561 05615 64559 
43 39 38689 08342 30459 85863 09284 
H 86141 15707 96256 23068 13782 08467 89169 
45 | 91621 00881 04900 54224 46177 55309 178: 491 89115 23466 
46 | 91896 67126 01151 03795 59077 11848 12030 98375 52008 60142 
17 21108 80830 02263 29303 96926 30506 09808 
18 95193 88812 00664 55017 17771 60448 875: 
49 12236 60277 39102 62315 07105 11844 01117 | 


* Reprinted from G. W. Snedecor, 


Ine., 


512 


1956 by permission of the publisher. 


Statistical Methods, F 


ifth Edition, 


lowa State College Pres 


TABLE 


8, 


M 


TABLE V Ten Thousand Randomly Assorted Digits (Con- 
tinued) 
50 51 55 59 60-64 65-69 70-74 75-70 50-814 85-89 90-94 95 99 

00 52098 04190 90164 29065 
01 d 43918 51141 
02 27101 33316 
03 32803 : 28612 
04 55368 34936 S0972 OSISS 
05 | 95008 88628 14530 80428 39930 31855 34334 64865 
06 | 54403 7 91017 71824 83671 GO518 37092 
07 16874 7 13215 80827 $2802 $4420 
08 7 91316 96: 01087 66091 
09 50689 39052 10814 35213 34471. 74441 
10 | 99116 75486 51989 52067 39495 39100 74073 
11 15696 10703 53988 71087 11670 
12 | 97720 15369 69620 33423 67453 56720 
13 11666 13841 98000 81899 07449 46967 
14 | 71628 73130 75691 09847 61547 18707 6004-1 
15 51089 91813 41995 88031 73631 69361 05375 15417 
16 82068 10708 86211 36584 60373 40051 
17 62173 14878 16783 86352 00077 
IN 98191 90813 15496 20168 09271 
19 02146 48228 72856 

20 10421 43648 

21 34434 54076 

22 70769 86413 

23 09134 63806 48: 

24 12197 09965 96657 59439 76330 

25 32883 29793 40914 65990 

26 80876 74798 - 

27 54481 78735 

28 30101 78295 55417 60048 

29 | 29287 69727 94443 64936 08366 27227 05158 50566 
30 | 74261 65172 80609 65340 
31 64081 18888 8975 от 210 
32 | 05617 67814 29575 44464 f 

33 | 26793 ‹ 74307 13330 20632 05497 

34 65988 48737 54719 52056 01596 35067 03134 

35 27300 42 44300 73399 21105 03280 43093 05192 

36 | 56760 10909 98147 34736 33863 95256 12731 66598 

37 | 72880 43338 93613 58904 59543 23913 11231 83268 

38 | 77888 38100 03062 47961 83841 25878 23746 

39 | 28440 07819 21580 47971 29882 13990 29226 23608 15873 
10 | 63525 94441 77033 12147 51054 58312 76023 96071 05813 
41 47606 93410 16359 89033 89696 64498 31776 05383 

12 | 52669 45030 96279 14709 7 02735 50803 72744 

43 16738 60159 62369 37875 21315 

^* | 59348 11695 15865 74739 32688 20271 65128 

45 12900 60774 94924 38636 67598 825 
16 | 75086 499: 13484 28617 707.19 - 
47 | 99195 29181 38190 68922 91077 40107 
B 26075 31671 45386 3 93450 48: 52022 60651 91321 
9 13636 93596 23377 51133 95126 61496 42474 46660 42338 

f 


TABLE v (CONTINUED) 


513 


TABLE V Ten Thousand Randomly Assorted Digits (Con- 
tinued) 


00-04 05-09 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 


5 4249 63664 39652 40646 97306 31741 07294 84149 46797 82487 
НІ 36538 44249 04050 48174 65570 44072 40192 51153 11397 58212 
52 05845 00512 78630 55328 18116 69296 91705 86224 29503 57071 
53 74897 68373 67359 51014 33510 83048 17056 72506 82949 54600 
54 20872 54570 35017 88132 25730 22626 86723 91691 13191 77212 


55 31432 96156 89177 75541 81355 24480 77243 76690 42507 84362 
56 66890 61505 01240 00660 05873 13568 76082 79172 57913 93448 
57 48194 57790 79970 33106 86904 48119 52503 24130 72824 21627 
58 11303 87118 81471 52936 08555 28420 49416 44448 04269 27029 
59 54374 57325 16947 45356 78371 10563 97191 53798 12693 27928 


60 | 64852 34421 61046 90849 13966 39810 42699 21753 76192 10508 
61 | 16309 20384 09491 91583 97720 89846 30376 76970 23063 35894 
62 | 42587 37065 24526 72602 57589 98131 37292 05967 26002 51945 
63 | 40177 98590 97161 41682 84533 67588 62036 49967 01990 72308 
64 | 82309 76128 93965 20743 24141 04838 40251 20065 07938 76236 


65 | 79788 68243 59732 04257 27084 14743 17520 95401 55811 76099 
66 | 40538 79000 89559 25026 42274 23189 34502 75508 06059 80682 
67 | 64016 73598 18609 73150 62463 33102 45205 87440 96767 67042 
68 | 49767 12691 17903 93871 99721 79109 09425 26904 07419 74013 
69 | 76974 55108 29795 08401 82684 00497 51126 79935 37450 35071 


70 | 23854 08480 85983 96025 50117 64610 99425 62291 86943 21541 
71 | 68973 70551 25008 78033 98573 79848 31778 29555 61446 23037 
72 | 36444 93600 65350 14971 25325 00427 52073 64980 155910 dares 
73 | 03003 87800 07391 11594 21196 00781 32550 57158 5888" 73041 
74 | 17540 26188 36647 78386 04558 01463 57842 90382 77019 91910 


75 38916 55809 47982 41968 69760 79422 80154 91486 19180 15100 
76 64288 19843 69122 42502 48508 28820 59933 72998 99942 10515 
77 86809 51561 38040 39418 49915 19000 58050 16899 79952 57849 
78 99800 99566 14742 05028 30033 94889 53381 23656 75787 59293 
79 92345 31890 95712 08279 91794 94068 49337 88674 35355 12267 
80 90363 65162 32245 82279 79256 80834 06088 99462 56705 06118 
81 64437 32242 48431 04835 39070 59702 31508 60935 22390 52246 
82 91714 53662 28373 34333 55791 74758 51144 18827 10704 76803 
83 20902 17646 31391 31459 33315 03444 55743 74701 58851 27427 
84 12217 86007 70371 52281 14510 76094 96579 


54853 78339 20839 
85 | 45177 02863 42307 53571 22532 74921 17735 42201 540 54721 
86 | 28325 90814 08804 52746 47913 54577 47523 80520 DU 


77705 95 66 
87 29019 28776 56116 54791 64604 08815 46049 71 186 $1030 id 994 
88 84979 81353 56219 67062 26146 82567 33122 14194 46240 92973 
89 50371 20347 48513 63915 11158 25563 91915 18431 92978 11591 


90 53422 06825 69711 67950 64716 18003 49581 45 

91 67453 35651 89316 41620 32048 70225 47597 39197 31449 B 445 
92 07294 85353 74819 23445 68237 07202 99515 62282 53809 26685 
93 79544 00302 45338 16015 66613 88968 14595 63836 77716 79596 
94 64144 85442 82060 46471 24162 39500 87351 


36637 42833 71875 
95 | 90919 11883 58318 00042 52402 28210 34075 3 
96 | 06670 57353 86275 92276 77591 16924 60830 by gorn me 
97 | 36634 93076 52062 83078 41256 00948 18683 48992 19462 96062 
98 | 75101 72891 85745 07106 26010 62107 6088 


37503 55 13 
99 05112 71222 72654 51583 05228 62056 57390 42746 5201 ce 


TABLE v (CONTINUED) 


TABLE V Ten Thousand Randomly Assorted Digits (Con- 
cluded) 
50-54 05-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 
50 32847 31282 89593 69214 78285 
51 16916 00041 55023 14253 12092 
52 66176 34047 27137 03191 
53 46299 13335 16861 38043 5 
54 22847 47839 23289 47526 5 1098 45683 64689 
41851 54160 69936 34803 92479 33399 71160 64777 83378 
28444 50407 95917 68553 28639 34174 11130 91994 
47520 62378 83174 13088 10501 59 26679 06238 51254 
E 34978 63271 82681 05271 08822 06490 44984 49307 62717 
59 37404 80416 92980 49486 74378 75610 74976 70056 15478 
60 32400 65482 52099 74648 65095 71551 
61 89262 86332 51718 6: 79820 84886 03591 
62 86866 09127 44832 40672 30180 
63 90814 14833 99094 32663 73040 
64 19192 82756 58446 75096 83898 43816 
65 77585 52593 56612 95766 10019 29531 73064 53523 58136 
66 23757 16364 05096 03192 62386 45389 55710 96459 
67 45989 96257 23850 26216 23309 21526 19455 29315 
68 92970 94243 07316 41467 64837 52406 31220 14032 
69 74346 59596 40088 98176 17896 86900 19099 48885 
70 87646 41309 27636 45153 29988 94770 07255 70908 05340 99751 
71 | 50099 71038 45146 06146 55211 99429 43169 66259 59180 
72 10127 46900 64984 75348 04115 33624 68774 60013 35515 62556 
73 | 67995 81977 18984 64091 02785 27762 42529 97144 80407 64524 
74 26304 80217 84934 82657 69291 35397 98714 35104 08187 48109 
75 81994 41070 56642 64091 31229 02595 13513 45148 78722 30144 
76 59537 34662 79631 89403 65212 09975 06118 86197 58208 16162 
77 51228 10937 62396 81460 47331 91403 95007 06047 16846 64809 
78 | 31089 37995 29577 07828 42272 54016 21950 86192 99016 84864 
79 38207 97938 93459 75174 79460 55436 57206 87644 21296 43395 
80 88666 31142 09474 89712 63153 62333 06140 42594 43671 
81 53365 56134 67582 92557 89520 33452 70628 27612 33738 
82 89807 74530 38004 90102 11693 90257 79920 62700 43325 
83 18682 81038 85662 90915 91631 22223 91588 80774 07716 12548 
84 63571 32579 63942 25371 09234 94592 98475 76884 37635 33608 
85 68927 56492 67799 95398 77642 54913 91853 08424 81450 76229 
86 56401 63186 39389 88798 31356 89235 97036 32341 33292 73757 
87 | 24333 95603 02359 72942 46287 95382 08452 62862 97869 71775 
88 | 17025 84202 95199 62272 06366 16175 97577 99304 41587 03686 
89 | 02804 08253 52133 20224 68034 50865 57868 22343 55111 03607 
90 08298 03879 20995 19850 73090 13191 18963 82244 78479 99121 
91 59883 01785 82403 96062 03785 03488 12970 64896 38336 30030 
92 46982 06682 62864 91837 74021 89094 39952 64158 79614 78235 
93 31121 47266 07661 02051 67599 24471 69843 83696 71402 76287 
94 97867 56641 63416 17577 30161 87320 37752 73276 48969 41915 
95 57364 86746 08415 14621 49430 22311 15836 72492 49372 44103 
96 | 09559 26263 69511 28064 75999 44540 13337 10918 79846 54809 
97 | 53873 55571 00608 42661 91332 63956 74087 59008 47493 99581 
98 | 35531 19162 86406 05299 77511 24311 57257 22826 77555 05941 
99 | 28229 88629 25695 94932 30721 16197 78742 31974 97528 45447 


П 


TABLE y (coNCLUDED) 


515 


TABLE VI Probability Points of t-Curves* 


d0. 05. 025 01 005 .001 0005 
df 20 10 00 02 00 ° 002 000 
1 1.000 138 3.08 631 12.71 31.82 6366 318.31 
2 816 106 189 292 430 697 993 2233 
3 765 0.98 164 235 318 454 584 1021 
4 AL 0.94 1.53 213 2.78 3.75 4.60 TAT 
5 727 0.92 148 202 257 337 5.89 
6 418 0.91 144 1.94 245 3.14 5.21 
-11 090 142 190 237 3.00 4.76 
8 706 0.89 140 186 231 2.90 
9 703 0.88 138 183 226 282 
10 700 088 1.37 181 223 2.76 
11 697 088 136 180 220 272 
12 695 087 136 178 218 268 
13 694 0.87 1.35 177 216 2.65 
14 692 087 1.35 176 215 2.62 
15 691 O87 134 175 2 2.95 
16 690 0.87 134 175 2. 2.92 
17 689 0.86 1.33 1.74 2. 2.90 
18 688 0.86 1.33 1.73 2. 2.88 
19 688 0.86 1.33 1.73 2) 2.86 
20 687° Об 133. 178 © 2.85 
21 686 0.86 1. l2 5j 2.8: 
22 686 0.86 132 1.72 2 282 
23 685 0.86 132 171 2 2.81 
24 685. O86 139. ia 2 2.80 
25 684 0.86 1.32 171 2 2.79 
26 684 0.86 1.32 171 2, 2.78 
27 684 0.86 1.31 170 2, 2.77 
28 H 1 1.70 2.07 2.76 
29 1.31 1.70 2. 2.76 
30 1.3] 170  * 5 
40 1.30 168 2 2m 
60 130 1.67 2, 2.66 
120 129 166 i. 2.62 
I: “ 1.28 1.65 [^ 2.58 3.00 


* Abridged from Table 12 in E. S. Pearson and H. O. Hartle. 
Statisticians, Volume 1, and reprinted with the perm І 
University Press. The point values in the first two column 
from Table III of Н. A. Fisher and F. Yates, Statistical 
and Medical Research, published by Oliver and 
the authors and publishers. 


y, eds, Biometrika Tables for 
n of the publisher, Cambridge 
s are reprinted in abridged form 
Tables for Biological, Agricultural, 
Boyd, Limited, Edinburgh, by permission of 


51 6 TABLE VI 


TABLE V | | 


Values of z, for Various Values of r* 


r 


r 


.000 
.005 
.010 
.015 
.020 


085 


.090 
005 
.100 
.105 


:000000 
-005000 
-010000 
15001 


030009 
035014 
040021 
:015030 
.050042 
1055056 


.060072 
.065092 
070115 
075141 
080171 
.085205 


090244 


120581 
125657 
130740 
‚135829 
140925 
146029 


5 
.300 
.305 
310 
3315 


"326086 


331646 
387227 
342828 
3484149 


092 


359756 


84699 
1190987 


“510069 
1516506 


597123 
604154 
(611240 


-693146 
-700995 
-708920 
716922 
735004 
733167 


741415 
‚7497149 


"784006 


.792812 
.801723 
810741 
.819870 
.829112 
838472 


1877171 
1887182 
897338 


-907643 
-918104 
928725 


477 


961621 


150 3 :618380 | .750 972953 
155 371 1 A455 954451 
160 376 760 E: 

165 .382642 65 

170 j .388422 s O L 1020320 
175 .176820 .394228 .654959 | .775 1.032725 
-180 .181982 .400059 -662461 | .780 1.04: 5368 
185 3 › 405916 -670029 | .785 1.058265 
190 411799 -677665 | .790 1.07142% 
.195 417710 .685370 | .795 1.081873 

= 


.800 
-805 
-S10 
S815 
.820 
.825 


1.738045 
1.782838 


* Taken from E. F. 


TABLE VII 


Lindquist, Statistical Analysis in Educational Research, Boston, Houghton 
Mifflin Company, 1940. 


517 


INDEX 


INDEX 


Absolute value: 204 
of deviations from mean, 138 
of deviations from median, 117-118 — 
Accuracy of prediction: see | Prediction 
problem, accuracy of prediction 
Additivity of scales, 66-67 
Aggregate proximity 
amid Btting regression lines, 414-415 
and median, 116-118 ў 
American Institute of Public 
233-235 
A 38-39, 220 
of histogram 38-39, 22 
апа oral distribution, 181, 187-193, 211 
and t-distribution, 340, 341-343 
(See also Area transformations; Extreme 
area) 
Area transformations: 219-232 
ssigning letter grades, 227-232 
defined, 219-220 
and 


Opinion, 


Arithmetic mean: see Mean, arithmetic 
Average deviation: see Mean deviation 
Averages: 98-12 
joint use of, 127-128 
(See also Mean; Median; Mode) 
Averages, selection of: 98-103, 119-128 
and expected 27 
and extreme 2 
and interest in total rather than typical, 
121-122 
and multimodal distributions, 122-123 
summary of considerations for, 127-128 
Axes: 
of histogram, 2: 
and ogive, 79-80 
and scatter diagram, 364 


Bias in sampling: 154, 243-244, 258 

defined, 240-241 

and sampling error, 240 

and variance, 258 

(See also Unbiased estimates) 
Bimodal distributions, 17, 27-28, 29, 3; 
Bivariate frequency distribution: 370-372 

and calculation of Pearson r, 391-399 

defined, 370 

grouped 1-372, 304 

marginal distributions in, 371 

and scatter plots, 370, 371 

(See also Normal bivariate model) 


Cahalan, J. D., 244 
Centiles, 72 


Central-limit theorem, 249 


520 


‚ 178-179 
Interval 


interval, midpoint of: sce 


and continuous and discrete data, 19 
and measurement to last unit, 21, 34 
and measurement to nearest unit, 20-21, 33 
and midpoint or index value, 22 

and real limits, 20-21 


selection. in frequency distribution: 
28-40 


Cl 


and computation of statistical indexes, 


and generalizing about form of a distribu- 
tion, 28-32 

and grouping error, 32, 108 

and markedly skewed distributions 


and scores concentrated at equall 
points, -36 


Cochran, W. G., 356 
Coeflicient : 

confidence, see Confidence 
correlation, see Pearson r 
ton, see Regression coefficient 
able measures: see Percentile 


interval 


ranks; 


8: 
„ Standard scores 

Composite Scores from test battery, 172-176 
Computatio: e 


1 formulas and procedure 
or estimated standard error of regri 
coefficient, 446 


for estimated stand 
Y. tercept, 447 

struction in s 
1,1 
for median, 131 -13: 
for Pearson r, 389 399 
ог percentile ranks, 72-76 
or percentiles, 76-83, 
for standard error of estimate, 428—430 


or sum of products of deviations from 
mean, 389 


for sum of Scores 
grouped frequ 


ard error of regression 


atisties, 6-7 


and squared scores in 
REG ency distributions, 51 — , 
ОГ sum of scores and squared scores И} 


Я relative frequency di ributions, 56 
ia of squared deviations from mean 


for esta 343, 348, 350 г 
"тар {nec and standard deviation. 


Computational results, symbolic statement 
f: 3 


in any е i ; 
in АШУ collection of scores. 48—50 
1 Gata contain} 


SAL iple measures pef 
individua], wA multiple measures 


INDEX 


Computational results (cont.) 
in data organized in subsets 
in frequency distribution, 51— 
in relative frequency distribution, 55-56 

Confidence coefficzent: see Confidence inter- 

vals > 

Confidence intervals: 
and confidence coefficient, 324, 
definition of, 323-327 
for difference between population means, 

359-360 
for individual prediction in 

situation, 456-459 
for population Pearson r, 46 
for population mea 2T 
for population median, : 
for population proportion: Б 
probability interpretation of, 
for regressi 
for regression 
und sample size, 325 
and size of standard error, 334 t 
for subpopulation Y-me: in regression 

situation, 454-456 
and t-statistic, 357-360 

Constants and summation, 

Continuous curve, IS1-18 

Continuous dat 
and class limi 
defined, 19-20 

irement to last unit, 21- 

irement to nearest unit, 
and psychological. and educational test 

scores, 21-22 
and sampling theory 
and medians, 250 

Control of variables: sce Equated groups 

Correlation: 5, 361-406 
coefficient, see Pearson r 
curv ilinear, see Curvilinear regression 
defined, in general, 361 
degree of, and scatter diagram, 365-370 
direct, or positive, 362 
ind ‘of, need for, 36 
inverse, or negative, 362, 438 
linear, see Linearity of regression 
negative and positive, distinguished, 362 
perfect, 361, 362, : 
trend lines in, 377- 

Cox, G. M. 5 

Critical ге gion: 
choice of, 270-271 
defined, 267 4 
and t-te with skewed distributions, 356 
and Type II error, 284-286, 297 

Cronbach, Lee J., 

Cross-validation, 

Cumulative frequency: 
of an interval, 73 
of an interval midpoint, 73 
and ogives, 79-89; see also Ogive —— 

Curvilinear correlation: see Curvilincar 

regression 

Curvilinear regression: 419 
and degree of relationship, 386 
examples of, 387 
and Pearson r, 386— 


320-330 


regression 


19-22 


22 
20-22 


concerning means 


, 276-277, 280, 297 


IG 


Deciles, 71, 89-93 
Degrees of freedom: 
of any statistic, 340-341. 
and tstati 341, 343, 
447, 457—458 
(See also Sample size) 
Dember, William N., 368 


347-348, 351, 


INDEX 


Deming, E. W., 244 
De Moivre, Abraham 
Descriptive statistics: 
and frequency distributions, 12-19 
Deviations 
and inc 
from mean, 
419-42 
from median, 116-118 
of obtained point from regression line, 
414-416, 423, 427, 429-430, 440 
sum of products of, from mean, 389-390 
sum of squares of, from mean 14 5 
Dichotomous | populations, 
275-281 
Differences between random 
sampling distribution of, 
standard error of, 254 
Direct correlation: see Correlation, direct 
Discrete data: 
defined, 19-20 
and real limi 


xes of variability 
114-115, 1; 


140-145, 160, 


variables: 
3-256 


, midpoints, 22 


and sampling theory concerning means 
and medians, 250 
Double-entry tables: see Tables, double- 
entry 


Efficiency of a statistic, 296 
Equated groups: 
and power of a test, 313-319 
and t-tests, 349-353 
Errors: 40 
of estimation, distribution of, 178-179 
grouping, see Grouping error 
in hypothesis-testing, see Type I errors; 
Type II errors 
in measurement, see Measurement, errors 
of prediction, in regression situation, 419, 
S, 457; see also Standard error of 


estimate 
sampling, see Sampling errors 
Estimation: 
errors of, distribution of, 178-179 
interval, see Interval estimation 


n | grouped data, 101, 108 
79-8 


of mod 
and ogive, 
of parameter: 
of percentile rank 
79-83 
of Бегей, in grouped data, 76-83 
point, ог single-valued approach to, 322, 
sec also Unbiased estimates 
of “population standard error of estimate, 
428, 430-43, 
standard error of, see Standard error of 
estimate 
of standard error of regression coefticient, 
446 
of standard error of regression Y-intercept, 
447 
of true percentile ranks, 70 
unbiased, see Unbiased estimates 
of variance е, in grouped data, 145 
Exact tests 36-337 
and comparison of {- and 2-tests, 
356 
Expected value: 257 
defined, 126 
and selection of averages, 123-127 
Extreme area: 
and level of s 


5 
n grouped data, 74-76, 


gnificance, 312-313 


reporting of, 312-31: 
Extreme scores: 
and mean, 120, 151 


and variance, 151 


521 


Feldt, Leonard viii 
Fisher, R. A., viii, 463, 464, 516 
Fisher's logarithmic transformation for Pear- 
son r: 462-465 
and confidence interval for Pearson r, 465 
defined, 463 
and sample size, 463 
and testing a non-zero hypothesis about 
population Pearson r, 464—165 
and test of difference between two values 
of Pearson r, 465—468 
use of, 463-464, 465—168 
variance of, 463 
Fitting: 
of linear regression equation, 412-416 
of normal curve to observed frequency 
distribution, 208-2 
Forms of distributions: 
examples, described, 27-28 
and interpretation of z-scores, 168-171 
and linear transformations, 168 
and quartile points, 93-94 
and scales of measurement, 216-218 
and selection of classes, 28-32 
and transformations of test scores, 217-218 
(See also Bimodal distribution; Multi- 
modal distribution; Rectangular dis- 
tribution; Skewness of distributions) 
Freeburne, Cecil M., 301 
Frequency distribution: 12-46 
bivariate, see Bivariate frequency distri- 
bution 
and class limits, 20-22 
and computation of mean, 104—107 
and computation of the variance of, 144— 
l 


cumulative, see Cumulative frequency 
defined, 18 

of errors of estimation, 178-179 

forms of, see Forms of distributions 


graphical representation of, , 30-32 
grouped, see Grouped frequency distri- 
butions 


homogeneous and heterogeneous, 94-95 
and interval size, 17, 18-19 
and probability distributions, 195 
and selection of clas: 28-40 
usefulness of, 15 
(See also Class selection in 
distribution) 
Frequency polygons, 25-26, 30-32 
Fundamental measuring scale, 65-66, 87, 
157, 162 


Gallup Poll, 233, 234, 235 
Generalization: 
defined, 28 
to hypothetical populations, 302-305 
Geometric mean, 99 
Goodness of fit: 414-416, 427 
and least-squares criterion, 415-416 
Gosset, William Se 338 
Grant, David A., viii 
Graphical representation: 
and comparing variability of two frequeney 
distributions, 42-46 
of correlated variables, see Scatter diagram 
of frequency distributio: 
frequency polygon, 2. 
histogram, 23-25 
and ogives, see Ogive 
of t-distribution, 339 
Grouped frequency distributions: 
and computation of mean, 10. 
and computation of median, 131-133 


frequeney 


522 


Grouped frequency distributions (cont.) 
and computations, 51-52 
and mode, 101 
and percentile ranks, 7- 
and percentiles, 76- 
and variance, 144-147 
(See also Grouping error) 

Grouping error: 32, 36 
and class selection, 32-33, 108 
in computation of mean, 105-108 
in median, 128-131 
in mode, 108 
in Pearson r, 394 
and Sheppard’s correction, 146-147 
in variance, 145-147 

Guilford, J. P., 434 


Harmonic mean, 
Hartley, Н. O., 
Heterogene: 


502, 516 


see also 


Histogram: 25, 38, 42-46 
of relative frequency distribution, 45 
scales for plotting, 45-46 
Homogeneity: 
of frequency distribution, 46, 94-95 В 
of populations, and normal distribution, 
215-216 
of variance, see Variance, equality of 
Homoscedasticity: 
defined, 428 
and sampling distribution of regression 
Statistics, 444—445 
and scatter diagrams, 428-429 
and standard error of estimate, 428 
Hypotheses, testing of: 264— 
arbitrary aspects of, 296-297 
and critical region, see Critical region 
and exact test, 336-337 
examples of, 268-281, 301-319 
and indirect proof, 266 
and interval estimation, 321 
and level of significance, sce Level of 
significance 
and linear regression and correlation, 
443-452, 454, 464-465, 465-468 
and power, see Power of statistical test. 
and sample size, see Sample si and 
,hypothesis-testing 
with small samples, 335-356 ` 
steps in, 266-268 . 
Type I and Type II errors in, see Type I 
errors; Type II errors 
(See also t-test) 
Hypothetical population, 302-305 


Independence, and random sampling, 254 | 
Index value of interval: sec Interval mid- 
„point 
Indirect proof, 264-266 
Individual difference, 
and equated groups, 315-316, 351 
and overlapping distributions, 85-89 
Inflection, points of normal curve, 184 
Integral limits of intervals: 2 21 
and location of intervals, 33-34 
Interpolation, in tables, 205-208 
ш estimation: 321-3: 
ased on t-statistic, 357-360, 452-453, 
Peers Statistic, 357-360, 452-45 
and confidence interval, 323-334 
and point estimation, 322 
(See also Confidence interval) 


^ 


INDE 


Interval midpoint: + 
cumulative frequency of, 
and grouping error, 106-107 
and selection of « 5 

Interval size, 17, h 

also Cla: seld 
bution 

Inverse correlation: see Correlation, inverse 


Jones, H. E., 385 


Least squares, method of: see Regression 
equation, linear case fitting by method 
of least squares 

Letter-grades, assignment of, 2: 

Level of sign 

defined, 267, 

and exact te: 

selection of, 281-2 

and Type I errors, 282 
Limits: 

confidence, see Confidence Intervals 

of summation, 49-50, 58 

Linear correlation: se earity of regression 

Linear function, 412-413; see also Regression 
equation, linear case 

Linear interpolation: see Interpolation, in 
tables 

Linearity of regression: 365 

defined, 381-382 

and degree of correlation, 382 

and Pearson r, 388 

and prediction problem, 410-416, 419-422 

and regression equation, sce Regression 
equation, linear case 

and the regression phenomenon, 436 

and sampling distributions of regression 
Statistics, 444—445 

and sampling errors, 382 

and scatter plot, 382 

Linear transformations: 219 

defined, 160 

and effect on Pearson r, 396-399 

and form of distributions, 168, 170, 219 

and z-scores, 160-162 

Literary Digest poll, 243-244 

Location, indexes of: see Averages 

Lorge, Irving, 193 


Marginal distribution, 371 
atched groups: see Equated groups 

Mathematical expectation: see Expected 

value 


Mathematical theor. 


and instruction in statistics, 6 
and sampling distributions 246-247 
ean: 


geometri 
harmon © 
Mean, arithmetic; 101-115 

of collection of scores in subsets, 109-111 

computation of, 103-108, 126 

confidence interval for, 327-330, 358 

defined, 102 

deviations from, sce Deviations, from mean 

and expected value, 126 

and extreme scores, 120, 151 

and interest in total rather than typical, 
121-122 

rules regarding, 108-115 

sampling distribution of, 246-249, 250 

of sampling distribution of Fisher's log 
transform for Pearson r, 46 

of scores plus a constant, 112-113 

of scores times a constant, 113-114 


, 99 


INDEX 


Mean (cont.) 
standard error of, 


scores, 158-159, 162, 175 
n deviation, 138-139 
Mean differer 
and difference between means, 316-317, 
0 
ndard error of, 315, 317, 350 
Means, difference between: 
confidence interval for 
und mean difference, 316 
sampling distribution of, 


‚ 359-300 
0 


tests of hypothe 
319 


and t-statis 46 360 , 
Me i see also Variance 
Me: 


and additivity, 66-67 

and the continuous-discrete distinction, 
19-20 

and edu 


ational and psychological tests, 

87 

in, and test scores. 

and fundamental scales, 6, 87, 162 

and rank-order scales, 65-68, 87, 162 

and rectangular distributions, 218 

reliabil of, and indexes of variability, 
134-1 153-156, 178 

and rounding, see Rounding, of measure- 
ments 

scales of, and forms of distributions, 216- 
218 


units of, 20, 65-67, 93, 135, 172 
Median: 71, 102-103, 121 

confidence interval for, 330-331 

defined, 101 

and extreme scores, 119-121 

grouping error in, 128-131 

minimum information for 

131-133 

and multimodal distributions, 122 

sampling distribution of, 249-251 

and skewne: 

sum of deviations from, 116-118 

standard error of, 250 

and "typical" score, 118, 120-121 

unbiased estimate of population value of, 


computation, 


Meredith, Howard V., 214 
Midpoint of interval: see Interval midpoint 
Mode: 99-101 

and extreme scores, 119-121 

and grouping error, 108 
Models, 177-179, 198-199, 461-462 
Multimodal distributions, 27, 100, 122-123 


Negative correlation: see Correlation, inverse 
Normal bivariate model: 461—462 
and Fisher’s log transform of Pearson r, 463 
and regression (prediction) model, 461 
and sampling distribution of Pearson r, 462 
and testing hypotheses about Pearson r, 
464, 467 
Normal curve: 177-232 
area relationships in, 187-190 
H orror distribution, 179, 182, 185 
E bserved frequency distribution, 


523 


rmal curve (cont.) 

ш of, as model, 212-218 
mathematical function for, 179-183 
ordinates of, 181, 183, 186-187, 190-193 
as probability distribution model, 198-199 
properties of, 181-190 "MA 
as relative frequency distribution, 181 
in standard-score form, 182-183 
use of tables for, 190-193, 199-208 — 
(See also Normality; Normalized distri- 

butions) 
Normality: 
and assigning letter-grades, 
assumed for t-tests, 338, 343 
353-356 

and inaccuracy of test scores, 218 

and normalized distributions, 2: -230 

and sampling distribution of differences 
between means, 254 р ` 

and sampling distribution of Fisher's log 
transform of Pearson r, 463, 467 b 

and sampling distribution of means, 248- 
249 


8- 
46-347, 350, 


and sampling dist ibution of medians, 249 

and sampling distribution of Pearson r, 402 

and sampling distribution of proportions, 
252 


of subpopulations in regression situation, 
445 


and use of t-statistic in interval estimation, 
357-360 


Normalized distributions, 219, 223-230 
Norms, 83-85 $ 
Notation, statistical, 64 


Ogive: 79-89 
and area transforms, 221-2 
defined, 79 
and estimating percentiles and percentile 
ranks, 81-85 
“smoothing” of, and estimating population 
percentiles and percentile ranks, 83-85 
Ordinates: 
of poma eurve, 181, 183, 186-187, 190- 
193 


of t-distribution, 338-339 
Overlapping distributions, 85-89 


Parameter: 
defined, 8 
interval estimation of, 321-334, 357-360, 

452-459, 465 
point estimates of, 322 
unbiased estim f, 257-261 

Pearson, E. S., viii, 2, 516 

Pearson r: 73 
causal interpretations of, 404-406 
comparison of values of, 40: 
computation of, 389, 391-. 
confidence interval for, 465 
defined, 374 
and degree of relationship, 403 
effect of curvilinear regression on, 386- 
effects of transformations on, 3 99 
effect of variability on, 4004 
Fisher's logarithmic transforn lation of, 

462-468 
grouping error in, 394 
as index of accuracy of prediction, 423-4 
459 
interpretation of, 402-406 
and normal bivariate model, 462-464, 467 
properties of, 374-377 
as ratio of two standard deviations, 424 
as ratio of two sums of squares, 440 


388 


524 


Pearson r (cont.) 
and regression coefficient, 420 
and regression phenomenon, 438 
sampling distribution of, 462 , 
and standard error of estimate, )-434 
test of difference between two values of, 
465-468 : 
and test of non-zero hypothesis about, 
464—465 
t-test of null hypothesis for, 450—152 
and z-scores, 3 ‚ 420-421 
Percentile norms, 83-8 
Percentile ranks: 157, 168 
and composite scores, 175-176 
computation of, 72-76 И 
computation of, in grouped frequency dis- 
tributions, 74-76 
defined, 68-69 
and normal curve, 187-190 
and norms, 83-85 
and population values, 83-85 
and standard scores, 170-171 
and use of ogive, 79-85 
Percentiles: 129 
computation of, 76-83, 133 
defined, 70 m 
estimation of population value of, 83-85 
indeterminate, 79 
special cases of, 71-72 
and units of meas rement, 93 
and use of ogive, 79-85 
Percentiles, distance between: 
and form of the distribution, 9: 
and frequencies along scale, 89-93 
as indication of variation among measures, 
. 94-07 
Point estimates of parameter: sec 
also Estimation, of true percentile ranks; 
, Unbiased estimates 
Poisson distr : 
Population: 
defined, 
dichotomous, see Dichotomous populations 
|omogeneity of, and normal distribution, 
215-216 
hypothetical, generalization to, 302-305 
unit of, 236 
(See also Universe and probability; Sam- 
pling) d 
ositive correlation: see Correlation, direct 
Power curve: 288-296 
defined, 288 
and t-tests, 2 
Power of stati. 
curves of, see Р, 
defined, 287 
and equated groups, 319, 
and esi in ample 
mple size, 345-346, 353, 356 
of standard error, 291-296 
st, 345, 352-35: 
and Type 1] errors, 4 
Prediction problem: 5, 407-442 
ane accuracy of prediction, 423-43 
and cross-validation, 459-461 
described, 407-408 
escribe 
441 


-346 
al tes 
ower curve 


7-296 


301 


7, 313 


d in regression terminology, 43% 


and linear regression, 410-416, 419 
and prediction when Pearson г is small, 
and reduction of prediction error with 
changes in Pearson r, 432—434 Я 
and the regression phenomenon, 435—435 
sampling theory concerning, 443-468 
unsatisfactory Solution of, 408—409 


INDEX 


1 


Prediction problem (cont.) А 
(See also Linearity of regression; Regres- 
sion equation, linear; Standard error of 
estimate) 
‚ү 
|} 


Predictor 
Probabilit 93-15 
addition rule for, 196-197 
applied to outcome of an uncertain event, 
197-198 
defined, 193 
distributions, 195 
Probable error, 459 
Product-moment correlation coefficient: see 
Pearson r 
Produc um of, and indexes of correlation, 
373-374, 389-0 
Proportionality property of standard scores: 
see Standard scores, proportionality 
property of 
Proportions: 
confidence interval for, 331-332 
differences between, confidence 
for, 334 
üiferenese between, sampling distribution 
of, 256 
sampling distribution, effect of size of 
theoretical proportion upon, 279 
sampling distribution of, 251-253, 279 
standard error of, 252, 276, 331 
testing differences between, 319-320 
testing hypotheses about, 275-281 
unbiased estimate of population value of, 
257 
Public opinion survey: see Sampling, and 
publie opinion surveys 


AL, 443 


interval 


Quartiles, 71, 93-94; see also Semi-inter- 
quartile range 
r: see Pearson r 
Random numbers, use of in sampling, 245- 
246 
Random sample: see 
random sample 
Range of talent: 
and Pearson r, 400-402, 431-432 
and standard error of estimate, 
Rank-order scales, 65-68, 87 
Range, as index of variabili 
Real limits of intervals, 20- 
Rectangular distribution: 29, 
and measurement, 218 
Rectilinear correlation: see 
regression 
Region of rejection: see Critical region 
Regression coefficient, least-square: 
formula for, 416, 420, 441—442 
and Pearson r, 420-421 
for regression of X on ¥, 442 


Sampling, simple 


Linearity of 


see Sum of 


of squares: 
Squares, regression 
Relative frequency: 40-41, 44-45 
distribution, and normal curve, 181 


INDEX 


Relative frequency (cont.) 
distribution, symbolic representation of, 
55-56 
and normal curve, 181, 202, 208-209 
and probability, 193, 195 
of score point, 
Repeated measures, on same individual: 111— 
112 
and normal curve test, 314-315 
and t-tests, 353 
Residuals: see Errors, of 
regression situation 
Residual sum of squares: sec Sum of squares, 
residual 
Rounding of measurements: 
to last unit, 21-22, 34 
to nearest unit, 20-21, 33 


prediction, in 


Sample size: 31, 101 

and approximate tests, 249, 335-337 

and appro 
tions, 257 

estimation of, and power, 297-301 

and Fisher's logarithmie transformation of 
Pearson r, 46 

and hypoth sting, 274-275, 335-337 

and interval estimates, 325, 358 

and power, 345-346, 353, 356 

and sampling distribution of means, 249, 
253 


and 
330 

and sampling distribution of Pearson r, 462 

npling distribution of proportions, 
1 


impling distribution of medians, 250, 


election of classes, 30 
nnd Type I and Type II errors, 301 
and variability, 155-156 
Sampling: 5 
bias in, see Bias in sampling 
and confidence intervals, 323-324 
defined, 235-236 
errors in, see Sampling errors 
judgment method of, 244 
methods of selection in, 242-246 
and normal bivariate model, 461-462 
probability method of, 244 
and publie opinion surveys, 233-234, 235 
and random numbers, 24 6 
in regression situation, 444—445, 461 
simple random sample, 244-246 
unit of, 236 
(See also Sampling distribution; Sampling 
theory) 
Sampling distribution: 
of difference between means, 256-257, 347 
of differences between random variables, 
253-256 
defined, 239-240 
estimates of standard error of, 261 
of Fisher’s logarithmic transformation of 
Pearson r, 463 
of means of random samples, 247-249, 250, 
253 
of medians of random samples, 249-251 
of Pearson r, 462 
of predicted Y-value in regression situa- 
tion, 453-454 
of proportions, 251-253 
of regression coefficient, 444—446 
of regression Y-intercept, 444—445, 447 
of t-statistic, see t-distribution 
unbiased estimates of variance of, 260 
of variances, 258 
(See also Sampling; Sampling theory) 


525 


Sampling errors: 251, 254 
and bias, 240 
defined, 239 
and form of frequency distribution: 
and linear regression, 382. 411, 4: 
445 


Sampling theory: 233-263 
applied to differences between means, 256— 
257, 346-353 
applied to differences between randomly- 
distributed variables, 253-256 А 
applied to linear regression and correlation, 
443-468 


applied to means of random samples, 246- 
249. 250 
applied to medians of random samples, 
249-251 
applied to proportions, 25 3 
definitions of terms in, 234—2. 
for small samples, 335-360; see also, Small- 
sample theory A" Й 
(See also Sampling; Sampling distribution) 
Scales, in plotting: ч 
histograms, 23, 45-46 
ogives, 79-80 " 
Scatter: see Variability 
Scatter diagram: 363-370, 418 
and degree of correlation, 365, 370 
and homoscedasticity, 498-429 
and linearity of regression, 382 
and "smoothing," 411 
Schneider, Marvin, 301 
Scores, standard: see Standard scores 
Semi-interquartile range (Q): 136 
and variance, 150-153 
Sheppard's correction, 146 
Significance level: see Level of significance 
Significance, statistical: 309-310 
of difference between means, 305-308, 310- 
311, 313-319, 346-353 5 
of difference between proportions, 319-320 
and extreme area, 312-313 
and non-statistical "significance," 310 
of obtained mean, 269-272, 274-275, 343- 


of obtained proportion, 275-281 
(See also Level of significance; Hypotheses, 
testing of) 
Size of interval: see Interval size 
Skewness of distributions: 27, 127-128 
and assigning letter grades, 230-231 
and critical region with t-tests, 356 
and grouping error in mean, 107—108 
and grouping error in medians, 130-131 
and mean, 120, 151 
and quartile points, 93 
and sampling distribution of Pearson r, 462 
and selection of an average, 119-121 
and selection of classes, 36-39 
and variance, 151-153 
Sligo, Joseph R., 319 
Slope parameter, in linear equation, 413; see 
also Regression coefficient. 
Small-sample theory, 335-360 
and large-sample theory, 353-356 
(See also Sample s é-distribution; £- 
statistic; t-test) 
“Smoothing”: 227 
of frequency polygons, 30-32 
of sample ogives, in estimating population 
percentiles and percentile ranks, 83-85 
and scatter plots, 411 
Snedecor, С. W., v 
Standard deviation: 
computation of, 140-144 


2 


526 


Standard deviation (cont.) 

defined, 140 

estimating population value of, 261 

and normal curve, 184-185, 186-190 

(See also Standard errof; Variance) 
Standard error: 

defined, 242 

of differences between means, 256, 261, 


347-348 
of differences between proportions, 6, 261 
of differences between random variables, 
254 


estimate of, for t-statistic, 338 . 

of estimate, see Standard error of estimate 

estimates of population values of, 261 

of mean difference, 317, 350 - 

of means of random samples, 248, 261, 325, 
337, 341 

of medians of random samples, 250, 261 

and precision of interval estimate 34 

of predicted Y-value in regress 
tion, 453-454 

and power, 291-296, 334 

of proportions, 25: 261, 331 

of regression coefficient, 446—447 

of regression Y-intercept, 447 

and Type II errors, 301 

Standard error of estimate: 

computational procedures for, 428-430 

estimation of population value of, 428, 431 

and homoscedasticity, 428 T 

as index of accuracy of prediction, 425-435 

and Pearson r, 430—434 

weakness of, as index of accuracy of pre- 
diction, 432 

Standard scores: 

defined, 158 

and form of distribution, 169-170 

general case of, 163-164 " 

interpretation of, when derived from dif- 

„ ferent raw score scales, 170-171 К 

interpretation of, when derived from dif- 
ferent reference groups, 168-170 

aa lingan transformations, 160-162, 164- 

5 


157-176, 219 


and normalized T-scores, 227 
and percentile ranks, 170-171 
and predicted score in regression situation, 


proportionality property of, 160-161, 165 
and wes. pagtsry composite Scores, 172-17 
“scores, 158-163; sce also z-scores 
Statistic: 207-268 ^ °° 9199 г-согез 
choice of, 296 

defined, 238 
efficiency of, 296 
See jen 2-scores, as test statistic; (-statis- 
ic 
Statistical inference, 5; see also Hypotheses, 
., testing of; Sampling theory 
Statistical methods: 
aspects of instruction in, 6-8 
general nature of, 3-5 
study methods in, 9-11 


Statistical Significance: see Significance, sta- 
tistical 


Student's distribution: sce t-distribution 


defined, 48 
, rules regarding, 52-54 
Sum of Squares: 
of deviations from mean, and variance; 


140-145; sec also Standard deviation: 
Variance 


ratio of two values of, and Pearson r, 440 


INDEX 


H 
Sum of squares (cont. 
for regression, 434, 439, 440 


of data organized in subsets, 56-60 
of frequency distribution, 50- 


of relative frequency distribution, 55-56 


Tables: 
double-entry, and bivariate frequency dis- 
tribution, 370-372 
*double-entry, in frequency tabulation, 41- 
42 


and interpolation, 205-208 
of normal curve, use of, 190-19: 
of random numbers, use of, 
of t-distribution, use of, 341 

Taylor, J. E., 301 

(distribution: 338-340 

characteristies of, 339-340 
+ equation of, 338 
examples of, plotted, 33 
and normal curve, 339-340 
ordinates of, 338-339 
‚ tables for, use of, 341-343 
(See also t-test; t-statistic) 

Test statistic: see Statistic 

Test unreliability, and errors in measure- 
ment, 22 

Thorndike, Edward L., 193 

Total sum of squares: see Sum of squares, for 
total 

Transformations: 

and effects on Pearson r, 399 

and forms of distributions, 217-218, 219 

and standard scores, 158-168 

(See also Linear transformations; Area 
transformations) 

Trend line in correlation, 377-388; see also 
Regression equation, linear Re- 
gression equation, curvilinear case; Lin- 
earity of regression 

27 


227 


computational procedures for, 343, 348— 

349, 350-351 

definition of, for normally-distributed sta- 
tistie, 338 

estimate of standard error for, 338 

formulas for, 343, 348, 350, 447—148, 451 

and interval 'e imation, 60 

(See also t-distribution; t- 

t-test: 

and assumption of normality, 338, 343, 
346-347, 350, 353-356 

critical region with, and non-normal popu- 
lation, 356 

or differences between means, 346-353 

and equality of variance, 346 349, 356 

and equated groups, 349-35: 

for individual prediction in regression sit- 
uation, 457—458 

or means, 343-346 

and normal-curve tests, 353-356 

or 26го hy pothes concerning Pearson r, 


est) 


450-4 
and power, and Type II errors, 345, 352- 
3, 356 
for predicted Y-value in regression situa- 
tion, 454 
© 
INDEX 


t-test (cont.) 
for regression coefficient, 447, 448-449 
for regression Y-intercept, 448, 449-450 
and repeated measures on same individual, 
353 
(See also t-distributions; t-statistic) 
Type I errors: 281-284 
and approximate and exact tests, 336-337 
defined, 281 
and level of significance, 282 
und t-test with non-normal populations, 
356 
Type П errors: 281-287 
and choice of critical region, 284-286, 297 
control of, 284-287 


defined, 281 
and power, 287 
and t-test, 345, 352-35: 


and variability of sampling distribution, 
926 


5 
(See also Power of statistical test) 
"Typical score, and median, 118, 120-121 
Unbiased estimates: _ 
of population means, medians, proportions, 
and differences in means and propor- 
tions, 257 
of population variance, 25 259 — 
of variances of sampling distributions, 
260 
(See also Bias in sampling) 
Unit normal curve, 182-183, 191; see also 
Normal eurve 
Units of measurement: see Measurement, 
units of 
Universe and probability, 193-194, 197 


Variability: 134-156 
comparison of, 42-46, 94-95, 153 
and graphical comparison of two distribu- 
tions, 42-46 
indexes of, see Variability, indexes of 
and Pearson r, see Range of talent 
and sample size, 155-156 
and test-battery composite scores, 172-173 
Variability, indexes of: 
and comparing variability, 153 
and reliability of measurement or estimate, 
134-135, 153-156, 178 
uses of, 134-135, 153-156 
and units of measuring scales, 135 
(See also. Mean deviation; Semi-inter- 
quartile range; Standard deviation; 
Standard error; Variability; Variance) 
ance: 139 
pees, of, in regression situation, 434- 


computation of, 140-145 

equality of, and t-statistic, 346-347, 359 

and extreme scores, 151-153 

and grouping error, 146-147 

апа homoseedasticity, see Homoscedastie- 
ity 

inequality of, and t-tests, 356 

ot predicted Scores in regression situation, 


of sampling distribution of Fisher’s log 
transform for Pearson r, 463 

of sampling distribution of predicted Y- 
value in regression situation, 453 

of scores plus a constant, 147-148 

res times a constant, 148-150 

and semi-interquartile range, 150-153 

and Sheppard’s correction, 146-147 

and skewness, 151-152 


527 


Variance (cont.) Я А А , А 
оѓ subpopulations in regression situations, 
427 
unbiased estimate of popwation value of, 
258-259 
of z-scores, 158-159, 162-163 


Walker, Helen M., 179, 212 
Weighting, of part-scores, 173-176 


Yates, F., viii, 516 
Yates correction, 253 


z-scores: 158-103 


z-scores (cont.) 

and assigning letter grades, 228-232 

defined, 158-160 

interpretation of, and form of distribution, 

4 

and linear regression equation, 420-422 

as linear transformations of raw scores, 
160-162 

mean of, 158-159, 1 

and Pearson г 

properties of, 

as test statistic, 
464, 467-468 

variance of, 158-159, 162 


2-163 ' 
272-281, 335, 337, 353-350, 


EFGHIJ-R-73210/6987654 


528 


INDEX 


Form No. 3. 


PSY, RES.L-1 


Bureau of Educational & Psychological 
Research Library. 


ee 


The book is to be returned within 
the date stamped last. 


oe ee ee 
WBGP-59/60-51190-5M 


